After the execution of this function, QC plots can be plotted with the function plotDiagnosticPlots_peakGene unless this has already been done by default due to plotDiagnosticPlots = TRUE

addConnections_peak_gene(
  GRN,
  overlapTypeGene = "TSS",
  corMethod = "pearson",
  promoterRange = 250000,
  TADs = NULL,
  TADs_mergeOverlapping = FALSE,
  knownLinks = NULL,
  knownLinks_separator = c(":", "-"),
  knownLinks_useExclusively = FALSE,
  shuffleRNACounts = TRUE,
  nCores = 4,
  plotDiagnosticPlots = TRUE,
  plotGeneTypes = list(c("all"), c("protein_coding")),
  outputFolder = NULL,
  forceRerun = FALSE
)

Arguments

GRN

Object of class GRN

overlapTypeGene

Character. "TSS" or "full". Default "TSS". If set to "TSS", only the TSS of the gene is used as reference for finding genes in the neighborhood of a peak. If set to "full", the whole annotated gene (including all exons and introns) is used instead.

corMethod

Character. One of pearson, spearman or bicor. Default pearson. Method for calculating the correlation coefficient. For pearson and spearman , see cor for details. bicor denotes the *biweight midcorrelation*, a correlation measure based on medians as calculated by WGCNA::bicorAndPvalue. Both spearman and bicor are considered more robust measures that are less prone to be affected by outliers.

promoterRange

Integer >=0. Default 250000. The size of the neighborhood in bp to correlate peaks and genes in vicinity. Only peak-gene pairs will be correlated if they are within the specified range. Increasing this value leads to higher running times and more peak-gene pairs to be associated, while decreasing results in the opposite.

TADs

Data frame with TAD domains. Default NULL. If provided, the neighborhood of a peak is defined by the TAD domain the peak is in rather than a fixed-sized neighborhood. The expected format is a BED-like data frame with at least 3 columns in this particular order: chromosome, start, end, the 4th column is optional and will be taken as ID column. All additional columns as well as column names are ignored. For the first 3 columns, the type is checked as part of a data integrity check.

TADs_mergeOverlapping

TRUE or FALSE. Default FALSE. Should overlapping TADs be merged? Only relevant if TADs are provided.

knownLinks

NULL or a data frame with exactly two columns. Both columns must be of type character and they must both contain genomic coordinates in the usual format: chr:start-end, while the 2 separators between the three elements can be chosen by the user. The first column denotes the **bait**, the promoter coordinates that are overlapped with the genes (usually their TSS, unless specified otherwise via the parameter overlapTypeGene, while the second column denotes the **other end (OE)** coordinates, which is overlapped with the peaks/enhancers from the GRN object. **NOTE: The provided column names are ignored and column 1 is interpreted as bait column and column 2 as OE column unless column names are exactly `bait` and `OE`.**. For more details, see the Workflow vignette.)

knownLinks_separator

Character vector of length 1 or 2. Default c(":", "-"). Separator(s) for the character columns that specify the genomic locations. The first entry splits the chromosome from the position, while the second entry splits the start and end coordinates. If only one separator is given, the same will be used for both.

knownLinks_useExclusively

TRUE or FALSE. Default FALSE. If kept at FALSE (the default), specified knownLinks will be used in addition to the regular peak-gene links that are identified via the default method. If set to TRUE, only the knownLinks will be used.

shuffleRNACounts

TRUE or FALSE. Default TRUE. Should the RNA sample labels be shuffled in addition to testing random peak-gene pairs for the background? When set to FALSE, only peak-gene pairs are shuffled, but for each pair, the counts from peak and RNA that are correlated are matched (i.e., sample 1 counts from peak data are compared to sample 1 counts from RNA). If set to TRUE, however, the RNA sample labels are in addition shuffled so that sample 1 counts from peak data are compared to sample 4 data from RNA, for example. Shuffling truly randomizes the resulting background eGRN. Note that this parameter and its influence is still being investigated. Until version 1.0.7, this parameter (although not existent explicitly) was implicitly set to TRUE.

nCores

Integer >0. Default 1. Number of cores to use. A value >1 requires the BiocParallel package (as it is listed under Suggests, it may not be installed yet).

plotDiagnosticPlots

TRUE or FALSE. Default TRUE. Run and plot various diagnostic plots? If set to TRUE, PDF files will be produced and saved in the output directory (in a subfolder called plots).

plotGeneTypes

List of character vectors. Default list(c("all"), c("protein_coding")). Each list element may consist of one or multiple gene types that are plotted collectively in one PDF. The special keyword "all" denotes all gene types that are found (be aware: this typically contains 20+ gene types, see https://www.gencodegenes.org/pages/biotypes.html for details).

outputFolder

Character or NULL. Default NULL. If set to NULL, the default output folder as specified when initiating the object in initializeGRN will be used. Otherwise, all output from this function will be put into the specified folder. If a folder is provided, while we recommend specifying an absolute path, a relative one also works.

forceRerun

TRUE or FALSE. Default FALSE. Force execution, even if the GRN object already contains the result. Overwrites the old results.

Value

An updated GRN object, with additional information added from this function.

Examples

# See the Workflow vignette on the GRaNIE website for examples
GRN = loadExampleObject()
#> Downloading GRaNIE example object from https://git.embl.de/grp-zaugg/GRaNIE/-/raw/master/data/GRN.rds
#> Finished successfully. You may explore the example object. Start by typing the object name to the console to see a summaty. Happy GRaNIE'ing!
GRN = addConnections_peak_gene(GRN, promoterRange=10000, plotDiagnosticPlots = FALSE)
#> INFO [2024-04-04 17:33:44] Data already exists in object or the specified file already exists. Set forceRerun = TRUE to regenerate and overwrite.
#> INFO [2024-04-04 17:33:44] Finished successfully. Execution time: 0 secs