This function prepares Seurat data for GRaNIE analysis by processing RNA and ATAC assays, performing clustering, and optionally saving the Seurat object.

prepareSeuratData_GRaNIE(
  seuratObject,
  outputDir = "pseudobulk",
  saveSeuratObject = TRUE,
  genome = "hg38",
  cellTypeAnnotationCol = "cell_type",
  assayName_RNA = "RNA",
  normRNA = "SCT",
  nDimensions_RNA = 50,
  recalculateVariableFeatures = NULL,
  RNA_features = NULL,
  assayName_ATAC_raw = "ATAC",
  normATAC = "LSI",
  LSI_featureCutoff = "q0",
  nDimensions_ATAC = 50,
  dimensionsToIgnore_LSI_ATAC = 1,
  integrationMethod = "WNN",
  WNN_knn = 20,
  pseudobulk_source = "cluster",
  countAggregation = "mean",
  clusteringAlgorithm = 1,
  clusterResolutions = c(1, 2, 5, 10, 15, 20),
  minCellsPerCluster = 100,
  subsample_percentage = 100,
  subsample_n = 1,
  seuratObject_compare = NULL,
  seuratObject_compare_umapName = "wnn.umap",
  doDimPlots = TRUE,
  run_MultiK = FALSE,
  forceRerun = FALSE
)

Arguments

seuratObject

A Seurat object containing the single-cell RNA and ATAC data

outputDir

Character string specifying the directory where the output files will be saved. Default is `"pseudobulk"`.

saveSeuratObject

Logical value indicating whether to save the processed Seurat object. Default is `TRUE`.

genome

Character. Genome assembly to be used, e.g., "hg38" or "mm10". Default is "hg38". Needed to derive features (genes) for RNA quantificaton.

cellTypeAnnotationCol

Name of the metadata column that summarizes the cell type, if present. used to provide additional annotations for each cluster and can be left empty if no such column exists in the Seurat object. Default: `cell_type`

assayName_RNA

Character string specifying the name of the raw RNA assay. Default is `"RNA"`.

normRNA

Character. Normalization method for RNA data. Currently SCTransform (SCT), "logp1" and "PFlogPF" and "custom" are supported. Default is "SCT".

nDimensions_RNA

Integer. Number of dimensions to use for RNA data dimensionality reduction. Default is 50.

recalculateVariableFeatures

Integer > 100, "all" or `NULL`. If set to a number or "all", variable feature calculation (HGVs) is recomputed even if already existing based on the normalized counts. If set to `NULL`, the original HGVs are kept.

RNA_features

RNA features. When set to `"NULL"` (the default), it is automatically retrieved given the genome version. Can be manually provided as a data frame. For the required structure, see this file: https://s3.embl.de/zaugg-web/GRaNIEverse/features_RNA_hg38.tsv.gz

assayName_ATAC_raw

Character string specifying the name of the ATAC assay (raw counts). Default is `"ATAC"`.

normATAC

Character. Normalization method for ATAC data. Currently, only "LSI" is supported. Default is "LSI".

LSI_featureCutoff

Character. Feature cutoff for LSI normalization. Default is "q0" (no cutoff, see `Signac::FindTopFeatures`)

nDimensions_ATAC

Integer. Number of dimensions to use for ATAC data dimensionality reduction. Default is 50.

dimensionsToIgnore_LSI_ATAC

Integer. Number of LSI dimensions to ignore for ATAC data. Default is 1.

integrationMethod

Character. Method for integrating RNA and ATAC data. Currently, only WNN is supported. Default is "WNN".

WNN_knn

Number of multimodal neighbors to compute for the weighted nearest neighbor (WNN) graph

pseudobulk_source

Character string specifying the source for pseudobulk aggregation. Default is `"cluster"`.

countAggregation

Character string specifying the method for count aggregation. Default is `"mean"`.

clusteringAlgorithm

Integer value specifying the algorithm to use for clustering as used by Seurat::FindClusters. Default is `1` (1 = original Louvain algorithm, 2 = Louvain algorithm with multilevel refinement, 3 = SLM algorithm, 4 = Leiden algorithm). Leiden requires the leidenalg python.

clusterResolutions

Numeric vector specifying the resolutions for clustering. Default is `c(1, 2, 5, 10, 15, 20)`.

minCellsPerCluster

Integer value specifying the minimum number of cells per cluster. Default is `100`.

subsample_percentage

Numeric value specifying the percentage of cells to subsample. Default is `100`.

subsample_n

Integer value specifying the number of times to subsample. Default is `1`.

seuratObject_compare

A Seurat object for comparison for the DimPLots. The object given here must be a superset of the Seurat object that is given for this function (i.e., all cells must also exist). Default is `NULL`.

seuratObject_compare_umapName

Character string specifying the UMAP name for comparison for the other Seurat object. Default is `"wnn.umap"`.

doDimPlots

Logical value indicating whether to generate dimension plots. Default is `TRUE`.

run_MultiK

Logical value indicating whether to run the MultiK clustering. Default is `FALSE`. Currently, this option is disabled and MultiK will not be run.

forceRerun

A logical value indicating whether to force rerun the function and re-generate the output even if the output files already exist on disk or in the object. Default is FALSE.

Value

The function processes the Seurat object and saves the results in the specified output directory. Optionally, it saves the processed Seurat object.