Main GRaNPA function. it will use differential expression data to construct a random forest model to predict it using the information from GRN.

GRaNPA_main_function(
  DE_data,
  GRN_matrix_filtered,
  DE_pvalue_th = 0.2,
  logFC_th = 0,
  num_run = 5,
  num_run_CR = 2,
  num_run_random = 5,
  cores = 10,
  importance = "permutation",
  ML_type = "regression",
  control = "cv",
  train_part = 1
)

Arguments

DE_data

Differential expression data. The DE matrix should contain 'ENSEMBL', 'logFC' and 'padj' columns

GRN_matrix_filtered

A data.frame with at least 2 columns contain TF.name column for TF names and gene.ENSEMBL for gene ENSEMBL ids. Optionally it can have a weight columns for weighting the connections

DE_pvalue_th

a cut off on adjusted pvalue for filtering the DE data. Default is 0.2

logFC_th

a cut off on absolute log2 Fold Change for filtering the DE data. Default is 0

num_run

Number of runs for real GRN (should be > 0). Default is 5 and at least 3 times is suggested.

num_run_CR

Number of runs for quality control GRN (should be > 0). Default is 2 and at least 2 times is suggested.

num_run_random

Number of runs for randomized GRN (should be > 0). Default is 5 and at least 3 times is suggested.

cores

Number of cores. default is 10.

importance

this is the algorithm to use for finding most important TFs. Default is permutation. impurity_corrected and impurity are the other options.

ML_type

Could be regression or classification. For regression it computes R^2 to predict actual values. For classification it computes accuracy to predict directionality of DE data.

control

Could be "cv" for " 10-fold cross validation" or 'oob' for "Out Of Bag" or 'bt' for "Bootstrap"

train_part

You can divide genes into the train and test. Here you can mention how much of data you want to use for training

Value

a GRaNPA object contains normal_dist = distribution of R^2 for actual network, random_dist = distribution of R^2 for random network, CR_dist = distribution of R^2 for QC network, nrm_imp_unscaled = unscaled importance score for each TF and each run nrm_imp_scaled = scaled importance score for each TF and each run normal_data = The actual data matrix which the RF has been applied normal_models = all the RF models for actual network random_models = all the RF models for random network

Examples

GRaNPA_main_function(DE_data, GRN_matrix_filtered)
#> INFO [2024-05-05 13:15:49] GRaNPA Main function
#> WARN [2024-05-05 13:15:49] Both Differentiall expression data and GRN should contain 'ENSEMBL ID' for the list of genes
#> INFO [2024-05-05 13:15:49] Differential expression will be filltered by 0.2 adjusted pvalue and 0 absolute logFC
#> Error in is.data.frame(x): object 'GRN_matrix_filtered' not found