The enrichment analysis is based on the subset of the network connected to a particular community as identified by calculateCommunitiesStats , see calculateTFEnrichment and calculateGeneralEnrichment for TF-specific and general enrichment, respectively. This function requires the existence of the eGRN graph in the GRN object as produced by build_eGRN_graph as well as community information as calculated by calculateCommunitiesStats. Results can subsequently be visualized with the function plotCommunitiesEnrichment.

calculateCommunitiesEnrichment(
  GRN,
  ontology = c("GO_BP", "GO_MF"),
  algorithm = "weight01",
  statistic = "fisher",
  background = "neighborhood",
  background_geneTypes = "all",
  selection = "byRank",
  communities = NULL,
  pAdjustMethod = "BH",
  forceRerun = FALSE
)

Arguments

GRN

Object of class GRN

ontology

Character vector of ontologies. Default c("GO_BP", "GO_MF"). Valid values are "GO_BP", "GO_MF", "GO_CC", "KEGG", "DO", and "Reactome", referring to GO Biological Process, GO Molecular Function, GO Cellular Component, KEGG, Disease Ontology, and Reactome Pathways, respectively. GO ontologies require the topGO, "KEGG" the clusterProfiler, "DO" the DOSE, and "Reactome" the ReactomePA packages, respectively. As they are listed under Suggests, they may not yet be installed, and the function will throw an error if they are missing.

algorithm

Character. Default "weight01". One of: "classic", "elim", "weight", "weight01", "lea", "parentchild". Only relevant if ontology is GO related (GO_BP, GO_MF, GO_CC), ignored otherwise. Name of the algorithm that handles the GO graph structures. Valid inputs are those supported by the topGO library. For general information about the algorithms, see https://academic.oup.com/bioinformatics/article/22/13/1600/193669. weight01 is a mixture between the elim and the weight algorithms.

statistic

Character. Default "fisher". One of: "fisher", "ks", "t". Statistical test to be used. Only relevant if ontology is GO related (GO_BP, GO_MF, GO_CC), and valid inputs are a subset of those supported by the topGO library (we had to remove some as they do not seem to work properly in topGO either), ignored otherwise. For the other ontologies the test statistic is always Fisher.

background

Character. Default "neighborhood". One of: "all_annotated", "all_RNA", "all_RNA_filtered", "neighborhood". Set of genes to be used to construct the background for the enrichment analysis. This can either be all annotated genes in the reference genome (all_annotated), all genes from the provided RNA data (all_RNA), all genes from the provided RNA data excluding those marked as filtered after executing filterData (all_RNA_filtered), or all the genes that are within the neighborhood of any peak (before applying any filters except for the user-defined promoterRange value in addConnections_peak_gene) (neighborhood).

background_geneTypes

Character vector of gene types that should be considered for the background. Default "all". Only gene types as defined in the GRN object, slot GRN@annotation$genes$gene.type are allowed. The special keyword "all" means no filter on gene type.

selection

Character. Default "byRank". One of: "byRank", "byLabel". Specify whether the communities enrichment will by calculated based on their rank, where the largest community (with most vertices) would have a rank of 1, or by their label. Note that the label is independent of the rank.

communities

NULL or numeric vector or character vector. Default NULL. If set to NULL, all community enrichments that have been calculated before are plotted. If a numeric vector is specified (when selection = "byRank"), the rank of the communities is specified. For example, communities = c(1,4) then denotes the first and fourth largest community. If a character vector is specified (when selection = "byLabel"), the name of the communities is specified instead. For example, communities = c("1","4") then denotes the communities with the names "1" and "4", which may or may not be the largest and fourth largest communities among all.

pAdjustMethod

Character. Default "BH". One of: "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr". This parameter is only relevant for the following ontologies: KEGG, DO, Reactome. For the other ontologies, the algorithm serves as an adjustment.

forceRerun

TRUE or FALSE. Default FALSE. Force execution, even if the GRN object already contains the result. Overwrites the old results.

Value

An updated GRN object, with the enrichment results stored in the stats$Enrichment$byCommunity slot.

Details

All enrichment functions use the TF-gene graph as defined in the `GRN` object. See the `ontology` argument for currently supported ontologies. Also note that some parameter combinations for `algorithm` and `statistic` are incompatible, an error message will be thrown in such a case.

Examples

# See the Workflow vignette on the GRaNIE website for examples
GRN = loadExampleObject()
#> Downloading GRaNIE example object from https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds
#> Finished successfully. You may explore the example object. Start by typing the object name to the console to see a summaty. Happy GRaNIE'ing!
GRN = calculateCommunitiesEnrichment(GRN, ontology = c("GO_BP"), forceRerun = FALSE)
#> INFO [2024-04-04 17:35:07] Running enrichment analysis for all 6 communities. This may take a while...
#> INFO [2024-04-04 17:35:07]  Community 1
#> INFO [2024-04-04 17:35:07] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2024-04-04 17:35:07]  Community 2
#> INFO [2024-04-04 17:35:07] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2024-04-04 17:35:07]  Community 3
#> INFO [2024-04-04 17:35:07] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2024-04-04 17:35:07]  Community 4
#> INFO [2024-04-04 17:35:07] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2024-04-04 17:35:07]  Community 5
#> INFO [2024-04-04 17:35:07] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2024-04-04 17:35:07]  Community 6
#> INFO [2024-04-04 17:35:08] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2024-04-04 17:35:08]  Finished successfully. Execution time: 3.3 secs