prepareSeuratData_GRaNIE.Rd
This function prepares Seurat data for GRaNIE analysis by processing RNA and ATAC assays, performing clustering, and optionally saving the Seurat object.
prepareSeuratData_GRaNIE(
seuratObject,
outputDir = "pseudobulk",
saveSeuratObject = TRUE,
genome = "hg38",
cellTypeAnnotationCol = "cell_type",
assayName_RNA = "RNA",
normRNA = "SCT",
nDimensions_RNA = 50,
recalculateVariableFeatures = NULL,
RNA_features = NULL,
assayName_ATAC_raw = "ATAC",
normATAC = "LSI",
LSI_featureCutoff = "q0",
nDimensions_ATAC = 50,
dimensionsToIgnore_LSI_ATAC = 1,
integrationMethod = "WNN",
WNN_knn = 20,
pseudobulk_source = "cluster",
countAggregation = "mean",
clusteringAlgorithm = 1,
clusterResolutions = c(1, 2, 5, 10, 15, 20),
minCellsPerCluster = 100,
subsample_percentage = 100,
subsample_n = 1,
seuratObject_compare = NULL,
seuratObject_compare_umapName = "wnn.umap",
doDimPlots = TRUE,
run_MultiK = FALSE,
forceRerun = FALSE
)
A Seurat object containing the single-cell RNA and ATAC data
Character string specifying the directory where the output files will be saved. Default is `"pseudobulk"`.
Logical value indicating whether to save the processed Seurat object. Default is `TRUE`.
Character. Genome assembly to be used, e.g., "hg38" or "mm10". Default is "hg38". Needed to derive features (genes) for RNA quantificaton.
Name of the metadata column that summarizes the cell type, if present. used to provide additional annotations for each cluster and can be left empty if no such column exists in the Seurat object. Default: `cell_type`
Character string specifying the name of the raw RNA assay. Default is `"RNA"`.
Character. Normalization method for RNA data. Currently SCTransform (SCT), "logp1" and "PFlogPF" and "custom" are supported. Default is "SCT".
Integer. Number of dimensions to use for RNA data dimensionality reduction. Default is 50.
Integer > 100, "all" or `NULL`. If set to a number or "all", variable feature calculation (HGVs) is recomputed even if already existing based on the normalized counts. If set to `NULL`, the original HGVs are kept.
RNA features. When set to `"NULL"` (the default), it is automatically retrieved given the genome version. Can be manually provided as a data frame. For the required structure, see this file: https://s3.embl.de/zaugg-web/GRaNIEverse/features_RNA_hg38.tsv.gz
Character string specifying the name of the ATAC assay (raw counts). Default is `"ATAC"`.
Character. Normalization method for ATAC data. Currently, only "LSI" is supported. Default is "LSI".
Character. Feature cutoff for LSI normalization. Default is "q0" (no cutoff, see `Signac::FindTopFeatures`)
Integer. Number of dimensions to use for ATAC data dimensionality reduction. Default is 50.
Integer. Number of LSI dimensions to ignore for ATAC data. Default is 1.
Character. Method for integrating RNA and ATAC data. Currently, only WNN is supported. Default is "WNN".
Number of multimodal neighbors to compute for the weighted nearest neighbor (WNN) graph
Character string specifying the source for pseudobulk aggregation. Default is `"cluster"`.
Character string specifying the method for count aggregation. Default is `"mean"`.
Integer value specifying the algorithm to use for clustering as used by Seurat::FindClusters
.
Default is `1` (1 = original Louvain algorithm, 2 = Louvain algorithm with multilevel refinement, 3 = SLM algorithm, 4 = Leiden algorithm).
Leiden requires the leidenalg python.
Numeric vector specifying the resolutions for clustering. Default is `c(1, 2, 5, 10, 15, 20)`.
Integer value specifying the minimum number of cells per cluster. Default is `100`.
Numeric value specifying the percentage of cells to subsample. Default is `100`.
Integer value specifying the number of times to subsample. Default is `1`.
A Seurat object for comparison for the DimPLots. The object given here must be a superset of the Seurat object that is given for this function (i.e., all cells must also exist). Default is `NULL`.
Character string specifying the UMAP name for comparison for the other Seurat object. Default is `"wnn.umap"`.
Logical value indicating whether to generate dimension plots. Default is `TRUE`.
Logical value indicating whether to run the MultiK clustering. Default is `FALSE`. Currently, this option is disabled and MultiK will not be run.
A logical value indicating whether to force rerun the function and re-generate the output even if the output files already exist on disk or in the object. Default is FALSE.
The function processes the Seurat object and saves the results in the specified output directory. Optionally, it saves the processed Seurat object.