This function runs the GRaNIE pipeline in batch mode, processing multiple clustering resolutions and generating gene regulatory network analyses.

runGRaNIE_batchMode(
  datasetName,
  inputDir,
  outputDir,
  clusterResolutions = c(0.1, seq(0.25, 1, 0.25), seq(2, 10, 1), seq(12, 20, 2)),
  filenameSuffix = "",
  idColumn_peaks = "peakID",
  idColumn_RNA = "ENSEMBL",
  genomeAssembly = "hg38",
  TFBS_source = "custom",
  TFBS_folder = NULL,
  TFBS_JASPAR_useSpecificTaxGroup = NULL,
  nCores = 4,
  normRNA_all = c("limma_quantile"),
  normATAC_all = c("DESeq2_sizeFactors"),
  includeSexChr = FALSE,
  minCV = 0,
  minNormalizedMean_peaks = 5,
  minNormalizedMean_RNA = 1,
  minSizePeaks = 5,
  corMethod = "pearson",
  promoterRange = 250000,
  useGCCorrection = FALSE,
  TF_peak.fdr.threshold = 0.2,
  peak_gene.fdr.threshold = 0.1,
  runTFClassification = FALSE,
  runNetworkAnalyses = FALSE,
  forceRerun = TRUE
)

Arguments

datasetName

Character string specifying the name of the dataset.

inputDir

Character string specifying the directory where the input files are located.

outputDir

Character string specifying the directory where the output files will be saved.

clusterResolutions

Numeric vector specifying the clustering resolutions to consider. Default is `c(0.1, seq(0.25, 1, 0.25), seq(2,10,1), seq(12,20,2))`.

filenameSuffix

Character string specifying the suffix for the output file names. Default is `""`.

idColumn_peaks

Character string specifying the column name for peak IDs. Default is `"peakID"`.

idColumn_RNA

Character string specifying the column name for RNA IDs. Default is `"ENSEMBL"`.

genomeAssembly

Character string specifying the genome assembly to use. Default is `"hg38"`.

TFBS_source

Character string specifying the source for transcription factor binding sites. Options are `"custom"`, `"JASPAR2022"`, `"JASPAR2024"`. Default is `"custom"`.

TFBS_folder

Character string specifying the folder containing custom transcription factor binding site files. Default is `NULL`.

TFBS_JASPAR_useSpecificTaxGroup

Character string specifying the specific taxonomic group to use from JASPAR. Default is `NULL`.

nCores

Integer value specifying the number of cores to use for parallel processing. Default is `4`.

normRNA_all

Character vector specifying the normalization methods to apply to RNA data. Default is `c("limma_quantile")`.

normATAC_all

Character vector specifying the normalization methods to apply to ATAC data. Default is `c("DESeq2_sizeFactors")`.

includeSexChr

Logical value indicating whether to include sex chromosomes in the analysis. Default is `FALSE`.

minCV

Numeric value specifying the minimum coefficient of variation for filtering. Default is `0`.

minNormalizedMean_peaks

Numeric value specifying the minimum normalized mean for peak filtering. Default is `5`.

minNormalizedMean_RNA

Numeric value specifying the minimum normalized mean for RNA filtering. Default is `1`.

minSizePeaks

Integer value specifying the minimum size for peaks. Default is `5`.

corMethod

Character string specifying the correlation method to use. Options are `"pearson"`, `"spearman"`, etc. Default is `"pearson"`.

promoterRange

Integer value specifying the range around the promoter to consider for peak-gene connections. Default is `250000`.

useGCCorrection

Logical value indicating whether to use GC content correction. Default is `FALSE`.

TF_peak.fdr.threshold

Numeric value specifying the FDR threshold for transcription factor-peak connections. Default is `0.2`.

peak_gene.fdr.threshold

Numeric value specifying the FDR threshold for peak-gene connections. Default is `0.1`.

runTFClassification

Logical value indicating whether to run transcription factor classification. Default is `FALSE`.

runNetworkAnalyses

Logical value indicating whether to run network analyses. Default is `FALSE`.

forceRerun

A logical value indicating whether to force rerun the function and re-generate the output even if the output files already exist on disk or in the object. Default is FALSE.

Value

The function processes the dataset and saves the results in the specified output directory.

Examples

if (FALSE) {
# Example usage:
runGRaNIE_batchMode(
  datasetName = "example_dataset",
  inputDir = "data/input/",
  outputDir = "data/output/",
  clusterResolutions = c(0.1, 0.5, 1),
  TFBS_source = "JASPAR2024",
  nCores = 8
)
}