Intro

The purpose of this analysis is to find a correlation between the boolean models fitness to a steady state activity profile and their performance in terms of the number of True Positive (TP) synergies predicted and/or the overall MCC score (Matthews Correlation Coefficient score). We want to show that a closer fitness to the steady state suggests more predictive models, corroborating thus our proof of concept of using an ensemble-based approach where models are trained towards a specific steady state signaling pattern for drug combination predictions.

The boolean model datasets we will use are in total \(9\): one for each cell line of interest (8 cell lines) where the models were fitted to a specific steady state in each case and one for the so-called random models which were generated randomly in the sense that were fitted only to a proliferation state (simulations were done using the DrugLogics software modules Gitsbe and Drabme).

Each boolean model dataset constitues of:

  • The model predictions file which has for each model the prediction for each drug combination tested (0 = no synergy predicted, 1 = synergy predicted, NA = couldn’t find stable states in either the drug combination inhibited model or in any of the two single-drug inhibited models)
  • The models stable state (one per model). A fitness score for each model can easily be calculated then by matching the model’s stable state (which is something inherent in the boolean’s model structure, a unique fixpoint attractor) with the steady state of interest, node per node. A higher fitness score would mean a better match of a model’s stable state to the cell line derived steady state (a perfect match would result in a fitness of 1).
  • The models link operators which is a representation of the boolean equations of each model. Each boolean equation is in the form: **Target *= (Activator OR Activator OR…) AND NOT (Inhibitor OR Inhibitor OR…)** and the difference between the models can be found in the link operator (1 = ‘OR NOT’, 0 = ‘AND NOT’, or absent) which has been mutated (changed) through the genetic algorithm in Gitsbe. Note that the equations that do not have link operators are the same for every model and are thus discarded.
  • The observed synergies file which lists the drug combinations that were observed as synergistic for each cell line.
  • The steady state file which lists the network nodes (protein, gene, complexes names, etc) and their activity value (0 or 1, representing an inhibited or active node respectively). This input is provided per cell line and not for the random models since they are just trained to a profileration state.

Input

Loading libraries:

library(DT)
library(ggpubr)
library(emba)
library(usefun)
library(nnet)
library(pscl)
library(ComplexHeatmap)
library(circlize)
library(dplyr)
library(tibble)
library(Ckmeans.1d.dp)
library(RColorBrewer)

First we load the cell-specific input data:

# Cell Lines
cell.lines = c("A498", "AGS", "DU145", "colo205", "SW620", "SF295", "UACC62", "MDA-MB-468")

cell.line.dirs = sapply(cell.lines, function(cell.line) {
  paste0(getwd(), "/", cell.line)
})

# Model predictions
model.predictions.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/model_predictions")
})

model.predictions.per.cell.line = lapply(model.predictions.files, 
  function(file) {
    get_model_predictions(file)
  }
)

# Observed synergies
observed.synergies.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/observed_synergies")
})

observed.synergies.per.cell.line = lapply(observed.synergies.files, 
  function(file) {
    get_observed_synergies(file)
  }
)

# Models Stable State (1 per model)
models.stable.state.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/models_stable_state")
})

models.stable.state.per.cell.line = lapply(models.stable.state.files,
  function(file) {
    as.matrix(read.table(file, check.names = FALSE))
  }
)

# Models Link Operators
models.link.operator.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/models_link_operator")
})

models.link.operators.per.cell.line = lapply(models.link.operator.files,
  function(file) {
    as.matrix(read.table(file, check.names = FALSE))
  }
)

# the node names used in our analysis
node.names = colnames(models.stable.state.per.cell.line[[1]])

# Steady States
steady.state.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/steady_state")
})

steady.state.per.cell.line = lapply(steady.state.files,
  function(file) {
    ss.df = read.table(file, sep = "\t", stringsAsFactors = FALSE)
    steady.state = ss.df[,2]
    names(steady.state) = ss.df[,1]
    
    # change value to NA for nodes for which there was no activity found (dash)
    steady.state[steady.state == "-"] = NA
    
    # keep only the nodes that are included in the analysis
    steady.state = prune_and_reorder_vector(steady.state, node.names)
    
    # check
    stopifnot(names(steady.state) %in% node.names)
    
    # return an integer vector since the activity values are binarized (0,1)
    return(sapply(steady.state, as.integer))
  }
)

The random model input data:

random.dir = paste0(getwd(), "/random")
random.model.predictions = get_model_predictions(paste0(random.dir, "/model_predictions"))

random.models.stable.state = as.matrix(
  read.table(file = paste0(random.dir, "/models_stable_state"), check.names = FALSE)
)

random.models.link.operator =
  as.matrix(read.table(file = paste0(random.dir, "/models_link_operator"), check.names = FALSE))

Model Analysis

In order to find the number of true positive predicted synergies, MCC scores and fitness scores for each of the models in each of the 9 datasets, we use functions from the emba R package.

Cell-specific

We find the MCC, TP and fitness values for each model per cell line (note that each model’s stable state in a specific cell line is matched against the steady state from that cell line):

models.tp.per.cell.line = list()
models.mcc.per.cell.line = list()
models.fitness.per.cell.line = list()

for (cell.line in cell.lines) {
  model.predictions = model.predictions.per.cell.line[[cell.line]]
  observed.synergies = observed.synergies.per.cell.line[[cell.line]]
  number.of.drug.comb.tested = ncol(model.predictions.per.cell.line[[cell.line]])
  
  # Split model.predictions to positive (observed) and negative (non-observed) results
  observed.model.predictions =
    get_observed_model_predictions(model.predictions, observed.synergies)
  unobserved.model.predictions =
    get_unobserved_model_predictions(model.predictions, observed.synergies)
  
  # Count the predictions of the observed synergies per model (TP)
  models.tp.per.cell.line[[cell.line]] = calculate_models_synergies_tp(observed.model.predictions)
  
  # Calculate Matthews Correlation Coefficient (MCC) for every model
  models.mcc.per.cell.line[[cell.line]] = 
    calculate_models_mcc(observed.model.predictions,
                         unobserved.model.predictions,
                         number.of.drug.comb.tested)
  
  # Fitness per model contrasted to steady state from cell line
  steady.state = steady.state.per.cell.line[[cell.line]]
  models.stable.state = models.stable.state.per.cell.line[[cell.line]]
  models.fitness.per.cell.line[[cell.line]] = apply(
    models.stable.state[,names(steady.state)], 1, get_percentage_of_matches, steady.state
  )
}

Random Models

Finding the MCC, TP and fitness values for each model from the above result (note that each model’s stable state in a specific cell line is matched against the steady state from that cell line and that the random models’ stable state data does not change per cell line, i.e. same random.models.stable.state object):

random.models.mcc.per.cell.line = list()
random.models.tp.per.cell.line = list()
random.models.fitness.per.cell.line = list()

for (cell.line in cell.lines) {
  observed.synergies = observed.synergies.per.cell.line[[cell.line]]
  number.of.drug.comb.tested = ncol(random.model.predictions)
  
  # Split model.predictions to positive (observed) and negative (non-observed) results
  observed.model.predictions =
    get_observed_model_predictions(random.model.predictions, observed.synergies)
  unobserved.model.predictions =
    get_unobserved_model_predictions(random.model.predictions, observed.synergies)
  
   # Count the predictions of the observed synergies per model (TP)
  random.models.tp.per.cell.line[[cell.line]] = 
    calculate_models_synergies_tp(observed.model.predictions)
  
  # Calculate Matthews Correlation Coefficient (MCC) for every model
  random.models.mcc.per.cell.line[[cell.line]] = 
    calculate_models_mcc(observed.model.predictions,
                         unobserved.model.predictions,
                         number.of.drug.comb.tested)
  
  # Fitness per model contrasted to steady state from cell line 
  steady.state = steady.state.per.cell.line[[cell.line]]
  random.models.fitness.per.cell.line[[cell.line]] = apply(
    random.models.stable.state[,names(steady.state)], 1, 
    get_percentage_of_matches, steady.state
  )
}

Choose best dataset

We now want to find the best dataset & cell line for our subsequent analysis - that is to show the performance vs fitness correlation. The argument here is that we want to choose a boolean dataset that has a large enough fitness value range along with a large TP and/or MCC value range, since with smaller value ranges it would be harder to distinguish the difference of the estimated distributions of the fitness scores belonging to different performance classes (i.e. models’ fitnesses that belong to different classification groups with the metric being either the number of TPs or the MCC score).

The next two tables will help us determine exactly which cell line and dataset to use:

cell.specific.model.data = matrix(data = NA, nrow = length(cell.lines), ncol = 11)
rownames(cell.specific.model.data) = cell.lines
colnames(cell.specific.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in cell.lines) {
  models.fitness = models.fitness.per.cell.line[[cell.line]]
  models.mcc = models.mcc.per.cell.line[[cell.line]]
  models.tp = models.tp.per.cell.line[[cell.line]]
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  
  cell.specific.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  cell.specific.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  cell.specific.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  cell.specific.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  cell.specific.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  cell.specific.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  cell.specific.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  cell.specific.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  cell.specific.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(cell.specific.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(cell.specific.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 1: Fitness, MCC scores and TP (True positives) for the Cell-specific model predictions across 8 Cell Lines"
datatable(data = cell.specific.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))
random.model.data = matrix(data = NA, nrow = length(cell.lines), ncol = 11)
rownames(random.model.data) = cell.lines
colnames(random.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in cell.lines) {
  models.fitness = random.models.fitness.per.cell.line[[cell.line]]
  models.mcc = random.models.mcc.per.cell.line[[cell.line]]
  models.tp = random.models.tp.per.cell.line[[cell.line]]
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  
  random.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  random.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  random.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  random.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  random.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  random.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  random.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  random.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  random.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  random.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  random.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(random.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(random.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 2: Fitness, MCC scores and TP (True positives) for the random model predictions across 8 Cell Lines"
datatable(data = random.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))

The below box plots compare the MCC values and fitness scores across all cell lines between the cell-specific models (trained to steady state) and the random ones (trained to proliferation):

num.of.models = nrow(random.models.stable.state)
data.list = list()

for (cell.line in cell.lines) {
  cell.line.vec = as.data.frame(rep(cell.line, num.of.models), stringsAsFactors = FALSE)
  models.mcc.cell.specific = remove_rownames(as.data.frame(models.mcc.per.cell.line[[cell.line]]))
  models.mcc.random        = remove_rownames(as.data.frame(random.models.mcc.per.cell.line[[cell.line]]))
  
  models.fitness.cell.specific = remove_rownames(as.data.frame(models.fitness.per.cell.line[[cell.line]]))
  models.fitness.random        = remove_rownames(as.data.frame(random.models.fitness.per.cell.line[[cell.line]]))
  
  data.list[[cell.line]] = bind_cols(cell.line.vec, models.mcc.cell.specific, 
    models.mcc.random, models.fitness.cell.specific, models.fitness.random)
}

data = bind_rows(data.list)
colnames(data) = c("cell.line", "MCC cell-specific", "MCC random", 
                   "fitness cell-specific", "fitness random")
# Note the cell-specific models have NaN MCC values
ggboxplot(data, x = "cell.line", y = c("MCC cell-specific", "MCC random"),
          palette = brewer.pal(3, "Set1"), merge = "asis",
          xlab = "Cell Lines", ylab = "MCC values", add = "point", add.params = list(size = 0.5))

ggboxplot(data, x = "cell.line", y = c("fitness cell-specific", "fitness random"),
          palette = brewer.pal(3, "Set1"), merge = "asis",
          xlab = "Cell Lines", ylab = "Fitness values", add = "point", add.params = list(size = 0.5))

So, from the two tables and the two box plots above we conclude that:

  • In general, the random models offer a larger range of fitness values when their stable states are matched against the steady state of each particular cell line. This is something we expected since these models weren’t fitted to a specific cell-line steady state (meaning that they weren’t chosen from the Gitsbe module as the 3 best from each simulation that match that steady state as best as possible) but rather to a more generic state of proliferation. Thus, they represent a set of models with larger variation in terms of structure (boolean model equations) compared to the cell-specific generated ones.
  • The cell-specific models have always a larger maximum fitness and median value compared to the random ones for each respective cell line.
  • In most of the cell lines, the cell-specific models show an equal or less performance (smaller MCC values) than the random ones

All in all, we will use the random models data, contrasted to the steady state of the AGS cell line (see Table 2). This dataset has the largest MCC range and TPR value (third largest fitness range value as well) combined in both tables above.

best.cell.line = "AGS"

fit = random.models.fitness.per.cell.line[[best.cell.line]]
tp  = random.models.tp.per.cell.line[[best.cell.line]]
mcc = random.models.mcc.per.cell.line[[best.cell.line]]

Statistical Analysis

Data preprocessing

Firstly, we filter the data by finding the unique models - those that have strictly different boolean equations. Then, we take a random sample out of these (while stabilizing the seed number for reproducibility purposes):

# For reproducibility
set.seed(10)
sample.size = 1000

# find the unique models
unique.models = rownames(unique(random.models.link.operator))

# remove models with NaN MCC scores
unique.models = unique.models[!unique.models %in% names(which(is.na(mcc)))]

# take a sample
unique.models.sample = sample(unique.models, size = sample.size)

fit.unique = fit[names(fit) %in% unique.models.sample]
tp.unique  = tp[names(tp) %in% unique.models.sample]
mcc.unique = mcc[names(mcc) %in% unique.models.sample]

df = as.data.frame(cbind(fit.unique, mcc.unique, tp.unique))

Note that the fitness and MCC score are continuous variables while the number of true positives (TP) is discrete.

Correlation Plots

Then, we check if the our data is normally distributed (using the Shapiro-Wilk test for normality and the Q-Q plots):

ggqqplot(data = df, x = "fit.unique", ylab = "Fitness")

ggqqplot(data = df, x = "mcc.unique", ylab = "MCC")

shapiro.test(x = fit.unique)

    Shapiro-Wilk normality test

data:  fit.unique
W = 0.98571, p-value = 2.606e-08
shapiro.test(x = mcc.unique)

    Shapiro-Wilk normality test

data:  mcc.unique
W = 0.97356, p-value = 1.599e-12

From the above results we observe that both the fitness and MCC scores are surely not normally distributed (with statistical significance). Thus, we will use non-parametric correlation scores, namely the Spearman and Kendall rank-based correlation tests, to check the correlation between the two continuous variables (models fitness values and their corresponding MCC score):

ggscatter(df, x = "mcc.unique", y = "fit.unique", 
          title = "Fitness vs Performance (MCC) - Spearman Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "spearman", label.x.npc = 0.7, label.y.npc = 0.1),
          xlab = "MCC scores", ylab = "Fitness Values")

ggscatter(df, x = "mcc.unique", y = "fit.unique", 
          title = "Fitness vs Performance (MCC) - Kendall Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall", label.x.npc = 0.7, label.y.npc = 0.1),
          xlab = "MCC scores", ylab = "Fitness Values")

From the correlation plots above, we observe a weak/small positive correlation between performance and fitness to the steady state.

To assess the correlation between the number of TP predictions of the models (discrete variable) and the models fitness (continuous variable), we construct a predictor of the categorical variable from the continuous variable: if the resulting classifier has a high degree of fit we can conclude the two variables share a relationship and are indeed correlated. Since there are more than 2 TP classes (values) in the dataset, we will use Multinomial Logistic Regression and fit log-linear models via neural networks:

model.classifier = multinom(data = df, formula = tp.unique ~ fit.unique)
# weights:  15 (8 variable)
initial  value 1609.437912 
iter  10 value 1542.742099
final  value 1540.769166 
converged
pseudo.r2.measures = pR2(model.classifier)
fitting null model for pseudo-r2
# weights:  10 (4 variable)
initial  value 1609.437912 
final  value 1546.531448 
converged
pseudo.r2.measures["McFadden"]
   McFadden 
0.003725939 

To measure the goodness-of-fit for our model, there are several measures proposed for logistic regression. We emphasize on the McFaddens’s pseudo-\(R^2\) measure for our classifier, for which a value of \(0.2-0.4\) would indicate an excellent fit [source]. Since we found less, we can also assume that there is only a weak/small positive correlation between the number of TPs and the fitness score of the models.

Next, we will next proceed with a more elaborate analysis, where the models will be split to different performance classes (TP or MCC score-derived) and the statistical correlation between the individual groups will be tested (with regards to their fitness scores).

TP-class vs fitness

First, some box and density plots to see practically what and where is the difference between the models fitness values belonging to different TP-classes:

# Box plots
ggboxplot(df, x = "tp.unique", y = "fit.unique", color = "tp.unique",
          palette = usefun:::colors.100[1:length(unique(tp.unique))],
          xlab = "True Positives (TP)", ylab = "Fitness values")

# Density Plots
densities = list()
for (tp.num in sort(unique(tp.unique))) {
  x = df %>%
    filter(tp.unique == tp.num) %>%
    select_at(.vars = c("fit.unique"))
  den = density(x$fit.unique)
  densities[[paste0(tp.num, " (", den$n, ")")]] = den
}

make_multiple_density_plot(densities, legend.title = "TP classes (#models)",
        title = "Density Estimation", x.axis.label = "Fitness score")

As we can see from the above plots, there doesn’t seem to be any significant difference between the fitness values of the different TP classes. Mathematically, we can show that by using the Kruskal-Wallis Rank Sum Test (notice the p-value - it is larger than \(0.001\)):

# Hypothesis testing: Are the location parameters of the distribution of x the same in each group?
kruskal.test(x = df[,"fit.unique"], g = df[, "tp.unique"])

    Kruskal-Wallis rank sum test

data:  df[, "fit.unique"] and df[, "tp.unique"]
Kruskal-Wallis chi-squared = 14.828, df = 4, p-value = 0.005072

To demonstrate that almost all pair of groups are the same, we perform pairwise Wilcoxon rank sum tests and we draw a heatmap of the each test’s p-value (note that all except one are not statistically significant):

res = pairwise.wilcox.test(x = df[,"fit.unique"], g = df[, "tp.unique"], p.adjust.method = "BH")
p.value.mat = res$p.value
col_fun = colorRamp2(breaks = c(0, 0.05, 0.5, 1), c("green", "white", "orange", "red"))
ht = Heatmap(matrix = p.value.mat, cluster_rows = FALSE, cluster_columns = FALSE,
  na_col = "white", name = "p-value", col = col_fun, column_names_rot = 0,
  row_title = "TP", row_title_rot = 0, row_names_side = "left",
  column_title = "P-values (Pairwise Wilcoxon tests) - TP-classified fitnesses", column_title_gp = gpar(fontsize = 20),
  cell_fun = function(j, i, x, y, width, height, fill) {
    if (!is.na(p.value.mat[i,j]))
        grid.text(sprintf("%.6f", p.value.mat[i, j]), x, y, gp = gpar(fontsize = 16))
})
draw(ht)

Note that the number of true positive predictions is an one-dimensional metric and by excluding the TN, FP and FNs we have a less-informant (and arguably incorrect) picture of the models performance classification. That’s why we will proceed with the more balanced MCC score for classifing the models fitnesses to different performance groups.

MCC-class vs fitness

Firstly, we perform a univariate k-means clustering to split the models MCC values to different classes and plot the data histogram. The number of MCC classes to split the data can be arbitrarily chosen and so we chose the same as the number of TP classes:

num.of.mcc.classes = 5
mcc.class.ids = 1:num.of.mcc.classes

# find the clusters
res = Ckmeans.1d.dp(x = df[,"mcc.unique"], k = num.of.mcc.classes)
mcc.class.id = res$cluster
df = cbind(df, mcc.class.id)

plot_mcc_classes_hist(df[,"mcc.unique"], df[,"mcc.class.id"],
                      num.of.mcc.classes, mcc.class.ids)

Then we show some box and density plots to see practically what and where is the difference between the models fitness values belonging to different MCC-classes:

# Box plots
ggboxplot(df, x = "mcc.class.id", y = "fit.unique", color = "mcc.class.id",
          palette = usefun:::colors.100[1:num.of.mcc.classes],
          xlab = "MCC class", ylab = "Fitness values")

# Density Plots
densities = list()
for (id in mcc.class.ids) {
  x = df %>%
    filter(mcc.class.id == id) %>%
    select_at(.vars = c("fit.unique"))
  den = density(x$fit.unique)
  densities[[paste0(id, " (", res$size[id], ")")]] = den
}

make_multiple_density_plot(densities, legend.title = "MCC classes (#models)",
        title = "Density Estimation", x.axis.label = "Fitness score")

Again we observe that there doesn’t seem to be any significant difference between the fitness values of the different MCC classes. The next statistical tests and heatmap show the same:

# Hypothesis testing: Are the location parameters of the distribution of x the same in each group?
kruskal.test(x = df[,"fit.unique"], g = df[, "mcc.class.id"])

    Kruskal-Wallis rank sum test

data:  df[, "fit.unique"] and df[, "mcc.class.id"]
Kruskal-Wallis chi-squared = 12.543, df = 4, p-value = 0.01374
pair.res = pairwise.wilcox.test(x = df[,"fit.unique"], g = df[, "mcc.class.id"], p.adjust.method = "BH")
p.value.mat = pair.res$p.value
col_fun = colorRamp2(breaks = c(0, 0.05, 0.5, 1), c("green", "white", "orange", "red"))
ht = Heatmap(matrix = p.value.mat, cluster_rows = FALSE, cluster_columns = FALSE,
  na_col = "white", name = "p-value", col = col_fun, column_names_rot = 0,
  row_title = "MCC class id", row_title_rot = 90, row_names_side = "left",
  column_title = "P-values (Pairwise Wilcoxon tests) - MCC-classified fitnesses", column_title_gp = gpar(fontsize = 20),
  cell_fun = function(j, i, x, y, width, height, fill) {
    if (!is.na(p.value.mat[i,j]))
        grid.text(sprintf("%.6f", p.value.mat[i, j]), x, y, gp = gpar(fontsize = 20))
})
draw(ht)

Extended Analysis

In this section, we get new steady states to test the fitness of our model datasets against. Note that the old ones used were calculated by using both the PARADIGM tool and tools to infer transcription factor activity (in the CLSS workflow). The new ones will use only PARADIGM. See README.md for more info on how to generate the new steady states.

The idea here is that the absense of correlation between fitness and performance shown above had to do with wrongly inferred steady states. So, even though the models have not be trained to these new steady states, their stable states remain the same and new fitness scores can be evaluated against these. Thus, the correlation of the models performance correspoding model fitness in each dataset.

Input

First, we get the new steady states:

steady.state.per.cell.line.new = list()
for (cell.line in cell.lines) {
  file = paste0(getwd(), "/steady-states/", cell.line, "_steady_state")
  ss.df = read.table(file, sep = "\t", stringsAsFactors = FALSE)
  
  steady.state = ss.df[,2]
  names(steady.state) = ss.df[,1]
  
  # change value to NA for nodes for which there was no activity found (dash)
  steady.state[steady.state == "-"] = NA
  
  # keep only the nodes that are included in the analysis
  steady.state = prune_and_reorder_vector(steady.state, node.names)
    
  # check
  stopifnot(names(steady.state) %in% node.names)
  
  # return an integer vector since the activity values are binarized (0,1)
  steady.state.per.cell.line.new[[cell.line]] = sapply(steady.state, as.integer)
}

# Which nodes are not in the steady state vector? (genes: xxx_g)
#node.names[which((node.names %in% names(steady.state.per.cell.line.new$SW620) == FALSE))]

We check to see if the new steady states are different from before by checking the percentage of common states:

for (cell.line in cell.lines) {
  percentage.match = get_percentage_of_matches(steady.state.per.cell.line.new[[cell.line]], 
    steady.state.per.cell.line[[cell.line]])
  expres = paste0(cell.line, ": ", specify_decimal(percentage.match, digits.to.keep = 2))
  if (cell.line == cell.lines[1])
    pretty_print_string(string = expres)
  else
    pretty_print_string(string = expres, with.gt = FALSE)
  print_empty_line(html.output = TRUE)
}

A498: 0.42
AGS: 0.48
DU145: 0.47
colo205: 0.52
SW620: 0.47
SF295: 0.48
UACC62: 0.49
MDA-MB-468: 0.53

So, we see that the new steady state vector for each cell line is around half different than the previous calculated one, so pretty much random!


We also get the gold/curation-based steady state for the AGS cell line which was one of the datasets given as supplementary material in the paper Discovery of Drug Synergies in Gastric Cancer Cells Predicted by Logical Modeling - see Table S4. Here is a version that is applicable to the node set in the Cascade 2.0 topology (that is used in all our simulations) with the addition of the steady state values for Prosurvival and Antisurvival nodes:

ags.literature.file = paste0(getwd(), "/steady-states/AGS_literature_steady_state")
ss.df = read.table(ags.literature.file, sep = "\t", stringsAsFactors = FALSE)
ags.literature.ss = ss.df[,2]
names(ags.literature.ss) = ss.df[,1]
stopifnot(names(ags.literature.ss) %in% node.names)

Comparing the common nodes between the AGS steady state vectors found with the 2 previous methods and the literature-curated profile in order to see how much more different the computationally predicted states for the AGS cell line are from the gold standard, we see that:

ags.ss.old = prune_and_reorder_vector(steady.state.per.cell.line$AGS, names(ags.literature.ss))
ags.ss.new = prune_and_reorder_vector(steady.state.per.cell.line.new$AGS, names(ags.literature.ss))

pretty_print_string(paste0("Match between the 2 AGS steady state vectors pruned to the literature-based nodes: ", specify_decimal(get_percentage_of_matches(ags.ss.old, ags.ss.new), digits.to.keep = 2)))

Match between the 2 AGS steady state vectors pruned to the literature-based nodes: 0.35

pretty_print_string(paste0("Match between the old AGS steady state vector and the literature-based one: ", specify_decimal(get_percentage_of_matches(ags.ss.old, ags.literature.ss), digits.to.keep = 2)))

Match between the old AGS steady state vector and the literature-based one: 0.57

pretty_print_string(paste0("Match between the new AGS steady state vector and the literature-based one: ", specify_decimal(get_percentage_of_matches(ags.ss.new, ags.literature.ss), digits.to.keep = 2)))

Match between the new AGS steady state vector and the literature-based one: 0.43

So, it seems that the 2 computational methods are very different even when compared on the node subset that constitutes the gold standard. Both of them seem to be random compared to the true (curated) steady state predictions.

Models fitness

Given the new steady states, we re-calculate the models fitness for both the cell-specific models and the random ones:

models.fitness.per.cell.line.new = list()
random.models.fitness.per.cell.line.new = list()

for (cell.line in cell.lines) {
  steady.state = steady.state.per.cell.line.new[[cell.line]]
  models.stable.state = models.stable.state.per.cell.line[[cell.line]]
  
  # Fitness per model contrasted to steady state from cell line
  models.fitness.per.cell.line.new[[cell.line]] = apply(
    models.stable.state[,names(steady.state)], 1, get_percentage_of_matches, steady.state
  )
  
  random.models.fitness.per.cell.line.new[[cell.line]] = apply(
    random.models.stable.state[,names(steady.state)], 1, 
    get_percentage_of_matches, steady.state
  )
}

We calculate the fitness of the AGS-trained models and the random ones to the literature-curated steady state for the AGS cell line:

ags.models.stable.state = models.stable.state.per.cell.line$AGS

models.fitness.to.ags.literature.ss = apply(
  ags.models.stable.state[,names(ags.literature.ss)], 1, 
  get_percentage_of_matches, ags.literature.ss)
random.models.fitness.to.ags.literature.ss = apply(
  random.models.stable.state[,names(ags.literature.ss)], 1, 
  get_percentage_of_matches, ags.literature.ss)

Choose best dataset

Following the same procedure as before, we try to find the best dataset to do the correlation analysis on. Such dataset will have models that result in a large MCC + fitness range when contrasted to the new steady states.

The next two tables will help us determine exactly which cell line and dataset to use:

cell.specific.model.data = matrix(data = NA, nrow = length(cell.lines) + 1, ncol = 11)
rownames(cell.specific.model.data) = c(cell.lines, "AGS-gold")
colnames(cell.specific.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in c(cell.lines, "AGS-gold")) {
  if (cell.line == "AGS-gold") {
    models.fitness = models.fitness.to.ags.literature.ss
    models.mcc = models.mcc.per.cell.line$AGS
    models.tp = models.tp.per.cell.line$AGS
  } else {
    models.fitness = models.fitness.per.cell.line.new[[cell.line]]
    models.mcc = models.mcc.per.cell.line[[cell.line]]
    models.tp = models.tp.per.cell.line[[cell.line]]
  }
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  if (cell.line == "AGS-gold") {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line$AGS)
  } else {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  }
  
  cell.specific.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  cell.specific.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  cell.specific.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  cell.specific.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  cell.specific.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  cell.specific.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  cell.specific.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  cell.specific.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  cell.specific.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(cell.specific.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(cell.specific.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 3: Fitness, MCC scores and TP (True positives) for the Cell-specific model predictions across 8 Cell Lines"
datatable(data = cell.specific.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))
random.model.data = matrix(data = NA, nrow = length(cell.lines) + 1, ncol = 11)
rownames(random.model.data) = c(cell.lines, "AGS-gold")
colnames(random.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in c(cell.lines, "AGS-gold")) {
  if (cell.line == "AGS-gold") {
    models.fitness = random.models.fitness.to.ags.literature.ss
    models.mcc = random.models.mcc.per.cell.line$AGS
    models.tp = random.models.tp.per.cell.line$AGS
  } else {
    models.fitness = random.models.fitness.per.cell.line.new[[cell.line]]
    models.mcc = random.models.mcc.per.cell.line[[cell.line]]
    models.tp = random.models.tp.per.cell.line[[cell.line]]
  }
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  if (cell.line == "AGS-gold") {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line$AGS)
  } else {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  }
  
  random.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  random.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  random.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  random.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  random.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  random.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  random.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  random.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  random.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  random.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  random.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(random.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(random.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 4: Fitness, MCC scores and TP (True positives) for the random model predictions across 8 Cell Lines"
datatable(data = random.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))

The below box plots compare the new fitness scores across all cell lines:

num.of.models = nrow(random.models.stable.state)
data.list = list()

for (cell.line in c(cell.lines, "AGS-gold")) {
  cell.line.vec = as.data.frame(rep(cell.line, num.of.models), stringsAsFactors = FALSE)
  
  if (cell.line == "AGS-gold") {
    models.fitness.cell.specific = remove_rownames(as.data.frame(models.fitness.to.ags.literature.ss))
    models.fitness.random        = remove_rownames(as.data.frame(random.models.fitness.to.ags.literature.ss))
  } else {
    models.fitness.cell.specific = remove_rownames(as.data.frame(models.fitness.per.cell.line.new[[cell.line]]))
    models.fitness.random        = remove_rownames(as.data.frame(random.models.fitness.per.cell.line.new[[cell.line]]))
  }
  
  colnames(models.fitness.cell.specific) = NULL
  colnames(models.fitness.random) = NULL
  
  data.list[[cell.line]] = 
    bind_cols(cell.line.vec, models.fitness.cell.specific, models.fitness.random)
}

data = bind_rows(data.list)
colnames(data) = c("cell.line", "fitness cell-specific", "fitness random")
ggboxplot(data, x = "cell.line", y = c("fitness cell-specific", "fitness random"),
          palette = brewer.pal(3, "Set1"), merge = "asis",
          xlab = "Cell Lines", ylab = "Fitness values", add = "point", add.params = list(size = 0.5))

We observe from Tables 3 and 4 and the boxplots above that all models show a wide range of fitness values since no models were trained to the new steady states (it’s like all models are random-generated). Also, the fitness of the AGS-specific and random models to the literature-curated steady state has the largest possible fitness range and small number of unique fitness values among the models (total of \(13\)).


We checked the following datasets and found no statistically significant correlation between fitness scores and performance metric (TP or MCC):

  • SF295, DU145 with cell-specific models (from Table 3)
  • A498, UACC62, DU145, AGS with random models (from Table 4)


Next, we investigate the AGS-gold dataset (row with the higher fitness and MCC range from Table 3). In this dataset the performance (MCC score) of the cell-specific AGS models is associated with the models fitness to the literature-curated AGS steady state.

AGS-gold Statistical Analysis

Firstly, we:

  • Filter the data by finding the unique models - those that have strictly different boolean equations
  • Remove the unique models with NaN MCC scores
  • Merge the fitness scores and MCC values in a common data.frame object
best.cell.line = "AGS"

# AGS-Gold cell-specific data
fit = models.fitness.to.ags.literature.ss
tp  = models.tp.per.cell.line[[best.cell.line]]
mcc = models.mcc.per.cell.line[[best.cell.line]]

unique.models = rownames(unique(models.link.operators.per.cell.line[[best.cell.line]]))
unique.models = unique.models[!unique.models %in% names(which(is.na(mcc)))]

fit.unique = fit[names(fit) %in% unique.models]
tp.unique  = tp[names(tp) %in% unique.models]
mcc.unique = mcc[names(mcc) %in% unique.models]

df = as.data.frame(cbind(fit.unique, mcc.unique, tp.unique))

Then, we check if the our data is normally distributed (using the Shapiro-Wilk test for normality and the Q-Q plots):

ggqqplot(data = df, x = "fit.unique", ylab = "Fitness")

ggqqplot(data = df, x = "mcc.unique", ylab = "MCC")

shapiro.test(x = sample(x = df$fit.unique, size = 1000))

    Shapiro-Wilk normality test

data:  sample(x = df$fit.unique, size = 1000)
W = 0.739, p-value < 2.2e-16
shapiro.test(x = sample(x = df$mcc.unique, size = 1000))

    Shapiro-Wilk normality test

data:  sample(x = df$mcc.unique, size = 1000)
W = 0.96911, p-value = 9.663e-14

From the above results we observe that both the fitness and MCC scores in both datasets are surely not normally distributed (with statistical significance). Thus, we will use non-parametric correlation scores, namely the Spearman and Kendall rank-based correlation tests, to check the correlation between the two continuous variables (models fitness values and their corresponding MCC score):

ggscatter(df, x = "fit.unique", y = "mcc.unique", 
          title = "AGS-Gold (Cell-specific): Fitness vs Performance - Spearman Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "spearman"),
          xlab = "Fitness Values", ylab = "MCC scores")

ggscatter(df, x = "fit.unique", y = "mcc.unique", 
          title = "AGS-Gold (Cell-specific): Fitness vs Performance - Kendall Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")

Fitness vs TP/MCC-class

Split the models to MCC classes:

num.of.mcc.classes = 5
mcc.class.ids = 1:num.of.mcc.classes

# find the clusters
res = Ckmeans.1d.dp(x = df$mcc.unique, k = num.of.mcc.classes)
mcc.class.id = res$cluster
df = cbind(df, mcc.class.id)

plot_mcc_classes_hist(df$mcc.unique, df$mcc.class.id, 
  num.of.mcc.classes, mcc.class.ids)

Box plots to show the difference of the different TP/MCC-classes in terms of fitness score:

ggboxplot(df, x = "mcc.class.id", y = "fit.unique", color = "mcc.class.id",
  palette = usefun:::colors.100[1:num.of.mcc.classes],
  xlab = "MCC class", ylab = "Fitness values", 
  add = c("mean", "mean_sd"), error.plot = "errorbar")

ggboxplot(df, x = "tp.unique", y = "fit.unique", color = "tp.unique",
  palette = usefun:::colors.100[1:length(unique(tp.unique))],
  xlab = "TP class", ylab = "Fitness values", 
  add = c("mean", "mean_sd"), remove = 5)

Fitness-class vs mean MCC

Since the gold-standard AGS steady state has only 23 nodes, our models generate a small number of unique fitness scores to that steady state (total of 13 unique fitness classes). So, instead of grouping the models per performance class (TP or MCC) we will group them by fitness class and show the average MCC value in each class:

num.of.fit.classes = length(unique(df$fit.unique))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df$fit.unique, k = num.of.fit.classes)
fit.class.id = res$cluster
df = cbind(df, fit.class.id)

unique.fit.stats = desc_statby(df, measure.var = "mcc.unique", grps = c("fit.unique", "fit.class.id"), ci = 0.95)

ggscatter(unique.fit.stats, x = "fit.unique", y = "mean", 
          title = "Fitness Class vs Performance (MCC) - Kendall Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall", label.x.npc = 0.7, label.y.npc = 0.1),
          xlab = "Unique Fitness values", ylab = "Mean MCC")

ggbarplot(unique.fit.stats, x = "fit.unique", y = "mean", title = "Fitness Class vs Performance (MCC)", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats$mean) + 0.1),
  label = unique.fit.stats$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats$fit.unique, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,10,13)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 7.5, y = 0.5, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")

# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.

There is evident statistical correlation between the models fitness to a curation-based steady state and their performance (measured by MCC score). Models with higher fitness to the steady state tend to have higher MCC score on average.


AGS-PARADIGM(+TF) analysis

What if we take the AGS steady state calculated with the PARADIGM tool (with or without the Transcription Factor activity) and prune to the nodes that constitute the AGS-gold steady state: will there be a correlation between fitness and performance then?


The AGS-Gold nodes are:

pretty_print_vector_names(ags.literature.ss)

23 nodes: AKT_f, Antisurvival, BAX, BCL2, CASP3, CASP8, CCND1, CTNNB1, ERK_f, GSK3_f, JNK_f, KRAS, MAPK14, MMP_f, MYC, NFKB_f, PIK3CA, Prosurvival, PTEN, RAC_f, S6K_f, TCF7_f, TP53

We first find the fitness of the models to the AGS steady state (Paradigm with and without TF included in the calculation) pruned to the AGS-gold nodes:

ags.models.stable.state = models.stable.state.per.cell.line$AGS

models.fitness.to.ags.ss.old = apply(
  ags.models.stable.state[,names(ags.ss.old)], 1, 
  get_percentage_of_matches, ags.ss.old)
models.fitness.to.ags.ss.new = apply(
  ags.models.stable.state[,names(ags.ss.new)], 1, 
  get_percentage_of_matches, ags.ss.new)

fit.old = models.fitness.to.ags.ss.old
fit.new = models.fitness.to.ags.ss.new
fit.unique.old = fit.old[names(fit.old) %in% unique.models]
fit.unique.new = fit.new[names(fit.new) %in% unique.models]

df.old = as.data.frame(cbind(fit.unique.old, mcc.unique, tp.unique))
df.new = as.data.frame(cbind(fit.unique.new, mcc.unique, tp.unique))

Then, we see if there is any correlation between fitness and performance (MCC) using general scatter plots:

ggscatter(df.old, x = "fit.unique.old", y = "mcc.unique", 
          title = "AGS (Paradigm+TF) - Cell-specific: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")

ggscatter(df.new, x = "fit.unique.new", y = "mcc.unique", 
          title = "AGS (Paradigm) - Cell-specific: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")

There is almost no correlation between fitness and performance when the fitness of the models is calculated with the same nodes as in AGS-gold steady state, but with the values as they were calculated by PARADIGM or PARADIGM including TF activity. This proves that the best models are those that fit to the true steady state and not the wrongly assessed ones!


To better visualize the absense of correlation, we group the models to distinct fitness classes and show the average MCC per class with the corresponding 95% CI.

First for the PARADIGM+TF (old) calculated steady state for AGS, pruned to the AGS-gold nodes:

num.of.fit.classes = length(unique(df.old$fit.unique.old))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.old$fit.unique.old, k = num.of.fit.classes)
fit.class.id = res$cluster
df.old = cbind(df.old, fit.class.id)

unique.fit.stats.old = desc_statby(df.old, measure.var = "mcc.unique", grps = c("fit.unique.old", "fit.class.id"), ci = 0.95)
  
ggbarplot(unique.fit.stats.old, x = "fit.unique.old", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm + TF", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.old$mean) + 0.1),
  label = unique.fit.stats.old$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.old$fit.unique.old, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 4.5, y = 0.38, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")

# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.

And secondly for the PARADIGM only (new) calculated steady state for AGS, pruned to the AGS-gold nodes:

num.of.fit.classes = length(unique(df.new$fit.unique.new))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.new$fit.unique.new, k = num.of.fit.classes)
fit.class.id = res$cluster
df.new = cbind(df.new, fit.class.id)

unique.fit.stats.new = desc_statby(df.new, measure.var = "mcc.unique", grps = c("fit.unique.new", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.new, x = "fit.unique.new", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm only", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.new$mean) + 0.1),
  label = unique.fit.stats.new$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.new$fit.unique.new, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 4.5, y = 0.35, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")

# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.

ASG-Gold Random analysis

In this section we want to investigate how the random models performance correlates with the fitness to the AGS-gold steady state, as well as the pruned PARADIGM steady state (with or with TF) - pruned to the 23 nodes of the AGS-gold ss.


First, we calculate the new fitness values for the random models, for each steady state case (AGS-Gold, PARADIGM only (new), PARADIGM+TF (old)):

# Get the MCC scores of the random models for the AGS
mcc = random.models.mcc.per.cell.line$AGS

# find the unique models
unique.models = rownames(unique(random.models.link.operator))

# remove models with NaN MCC scores
unique.models = unique.models[!unique.models %in% names(which(is.na(mcc)))]

# Prune MCC scores to unique models
mcc.unique = mcc[names(mcc) %in% unique.models]

# Find fitness of the random models to the 3 steady states
# random.models.fitness.to.ags.literature.ss (already calculated for AGS-Gold)
random.models.fitness.to.ags.ss.old = apply(
  random.models.stable.state[,names(ags.ss.old)], 1, 
  get_percentage_of_matches, ags.ss.old)
random.models.fitness.to.ags.ss.new = apply(
  random.models.stable.state[,names(ags.ss.new)], 1, 
  get_percentage_of_matches, ags.ss.new)

fit.gold = random.models.fitness.to.ags.literature.ss
fit.old = random.models.fitness.to.ags.ss.old
fit.new = random.models.fitness.to.ags.ss.new

fit.unique.gold = fit.gold[names(fit.gold) %in% unique.models]
fit.unique.old = fit.old[names(fit.old) %in% unique.models]
fit.unique.new = fit.new[names(fit.new) %in% unique.models]

df.gold.random = as.data.frame(cbind(fit.unique.gold, mcc.unique))
df.old.random = as.data.frame(cbind(fit.unique.old, mcc.unique))
df.new.random = as.data.frame(cbind(fit.unique.new, mcc.unique))

Then, we check to see if there is any correlation between fitness and performance (MCC) using general scatter plots:

ggscatter(df.gold.random, x = "fit.unique.gold", y = "mcc.unique", 
          title = "AGS-Gold - Random: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")

ggscatter(df.old.random, x = "fit.unique.old", y = "mcc.unique", 
          title = "AGS (Paradigm+TF) - Random: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")

ggscatter(df.new.random, x = "fit.unique.new", y = "mcc.unique", 
          title = "AGS (Paradigm) - Random: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")

There is very small correlation between fitness and performance in the case of fitness to the AGS-Gold steady state and the one that is calculated with the same nodes as in AGS-gold steady state, but with the values as they were calculated by PARADIGM including TF activity.


To better visualize the correlation relationship (if there is any), we group the models to distinct fitness classes and show the average MCC per class with the corresponding 95% CI.

First for the AGS-Gold steady state:

num.of.fit.classes = length(unique(df.gold.random$fit.unique.gold))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.gold.random$fit.unique.gold, k = num.of.fit.classes)
fit.class.id = res$cluster
df.gold.random = cbind(df.gold.random, fit.class.id)

unique.fit.stats.gold = desc_statby(df.gold.random, measure.var = "mcc.unique", grps = c("fit.unique.gold", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.gold, x = "fit.unique.gold", y = "mean", title = "Fitness Class vs Performance (MCC) - AGS-Gold - Random", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.gold$mean) + 0.1),
  label = unique.fit.stats.gold$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.gold$fit.unique.gold, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,10)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 5.5, y = 0.48, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")

# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.

Secondly, for the PARADIGM+TF (old) calculated steady state for AGS, pruned to the AGS-gold nodes:

num.of.fit.classes = length(unique(df.old.random$fit.unique.old))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.old.random$fit.unique.old, k = num.of.fit.classes)
fit.class.id = res$cluster
df.old.random = cbind(df.old.random, fit.class.id)

unique.fit.stats.old = desc_statby(df.old.random, measure.var = "mcc.unique", grps = c("fit.unique.old", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.old, x = "fit.unique.old", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm+TF - Random", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.old$mean) + 0.1),
  label = unique.fit.stats.old$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.old$fit.unique.old, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,9)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 5.5, y = 0.48, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")

# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.

Lastly, for the PARADIGM (new) calculated steady state for AGS, pruned to the AGS-gold nodes:

num.of.fit.classes = length(unique(df.new.random$fit.unique.new))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.new.random$fit.unique.new, k = num.of.fit.classes)
fit.class.id = res$cluster
df.new.random = cbind(df.new.random, fit.class.id)

unique.fit.stats.new = desc_statby(df.new.random, measure.var = "mcc.unique", grps = c("fit.unique.new", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.new, x = "fit.unique.new", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm - Random", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.new$mean) + 0.1),
  label = unique.fit.stats.new$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.new$fit.unique.new, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,10)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 5.5, y = 0.48, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")

# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.

R session info

xfun::session_info()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS, RStudio 1.2.5001

Locale:
  LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
  LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
  LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
  LC_PAPER=en_US.UTF-8       LC_NAME=C                 
  LC_ADDRESS=C               LC_TELEPHONE=C            
  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

Package version:
  assertthat_0.2.1     backports_1.1.5      base64enc_0.1-3     
  BH_1.69.0.1          bibtex_0.4.2         circlize_0.4.8      
  Ckmeans.1d.dp_4.3.0  cli_1.1.0            clue_0.3-57         
  cluster_2.1.0        codetools_0.2-16     colorspace_1.4-1    
  compiler_3.6.1       ComplexHeatmap_2.0.0 cowplot_1.0.0       
  crayon_1.3.4         crosstalk_1.0.0      datasets_3.6.1      
  digest_0.6.21        dplyr_0.8.3          DT_0.9              
  ellipsis_0.3.0       emba_0.1.2           evaluate_0.14       
  fansi_0.4.0          fastmap_1.0.1        gbRd_0.4-11         
  GetoptLong_0.1.7     ggplot2_3.2.1        ggpubr_0.2.3        
  ggrepel_0.8.1        ggsci_2.9            ggsignif_0.6.0      
  GlobalOptions_0.1.1  glue_1.3.1           graphics_3.6.1      
  grDevices_3.6.1      grid_3.6.1           gridExtra_2.3       
  gtable_0.3.0         highr_0.8            htmltools_0.4.0     
  htmlwidgets_1.5.1    httpuv_1.5.2         igraph_1.2.4.1      
  jsonlite_1.6         knitr_1.25           labeling_0.3        
  later_1.0.0          lattice_0.20.38      lazyeval_0.2.2      
  lifecycle_0.1.0      magrittr_1.5         markdown_1.1        
  MASS_7.3-51.4        Matrix_1.2.17        methods_3.6.1       
  mgcv_1.8.29          mime_0.7             munsell_0.5.0       
  nlme_3.1.141         nnet_7.3-12          packrat_0.5.0       
  parallel_3.6.1       pillar_1.4.2         pkgconfig_2.0.3     
  plogr_0.2.0          plyr_1.8.4           png_0.1-7           
  polynom_1.4.0        promises_1.1.0       pscl_1.5.2          
  purrr_0.3.2          R6_2.4.0             RColorBrewer_1.1-2  
  Rcpp_1.0.2           Rdpack_0.11-0        reshape2_1.4.3      
  rje_1.10.10          rjson_0.2.20         rlang_0.4.0         
  rmarkdown_1.16       rstudioapi_0.10      scales_1.0.0        
  shape_1.4.4          shiny_1.4.0          sourcetools_0.1.7   
  splines_3.6.1        stats_3.6.1          stringi_1.4.3       
  stringr_1.4.0        tibble_2.1.3         tidyr_1.0.0         
  tidyselect_0.2.5     tinytex_0.16         tools_3.6.1         
  usefun_0.4.3         utf8_1.1.4           utils_3.6.1         
  vctrs_0.2.0          viridisLite_0.3.0    visNetwork_2.0.8    
  withr_2.1.2          xfun_0.10            xtable_1.8-4        
  yaml_2.2.0           zeallot_0.1.0       
---
title: "Cascade Performance vs Fitness Model Analysis"
author: "[John Zobolas](https://github.com/bblodfon)"
date: "Last updated: `r format(Sys.time(), '%d %B, %Y')`"
output:
  html_document:
    css: style.css
    theme: united
    toc: true
    toc_float:
      collapsed: false
      smooth_scroll: true
    number_sections: false
    code_folding: hide
    code_download: true
---

```{r Render command, eval=FALSE, include=FALSE}
#rmarkdown::render(input = "./performance_vs_fitness.Rmd", output_format = "html_document", output_dir = "../../docs/cascade/cell-lines-2500/")
```

## Intro {-}

The purpose of this analysis is to find a correlation between the boolean models
**fitness to a steady state activity profile** and their **performance** in terms of the
number of *True Positive* (TP) synergies predicted and/or the overall *MCC score*
(Matthews Correlation Coefficient score). We want to show that **a closer fitness 
to the steady state suggests more predictive models**, corroborating thus our proof of concept of using an ensemble-based approach where models are trained towards a 
specific steady state signaling pattern for drug combination predictions.

The boolean model datasets we will use are in total $9$: one for each cell line 
of interest (8 cell lines) where the models were **fitted to a specific steady state** in each 
case and one for the so-called **random models** which were generated *randomly* in 
the sense that were fitted only to a proliferation state (simulations were done using 
the DrugLogics software modules `Gitsbe` and `Drabme`).

Each boolean model dataset constitues of:

- The **model predictions** file which has for each model the prediction for 
each drug combination tested (*0* = no synergy predicted, *1* = synergy 
predicted, *NA* = couldn't find stable states in either the drug combination 
inhibited model or in any of the two single-drug inhibited models)
- The **models stable state** (one per model). A **fitness score**
for each model can easily be calculated then by matching the model's stable 
state (which is something inherent in the boolean's model structure, a unique 
fixpoint attractor) with the steady state of interest, node per node.
A **higher fitness score** would mean a better match of a model's 
stable state to the cell line derived steady state (a perfect match would result 
in a fitness of 1).
- The **models link operators** which is a representation of the boolean equations
of each model. Each boolean equation is in the form: **Target *= (Activator OR Activator OR...) AND NOT (Inhibitor OR Inhibitor OR...)** and the difference between the models can be found in the 
*link operator* (*1* = 'OR NOT', *0* = 'AND NOT', or absent) which has been 
mutated (changed) through the genetic algorithm in `Gitsbe`. Note that the equations that do 
not have link operators are *the same for every model* and are thus discarded.
- The **observed synergies** file which lists the drug combinations that were 
observed as synergistic for each cell line.
- The **steady state** file which lists the network nodes (protein, gene, complexes
names, etc) and their activity value (0 or 1, representing an inhibited or active
node respectively). 
This input is provided per cell line and not for the random models since they are just trained to a profileration state.

## Input {-}

Loading libraries:
```{r Load libraries, message = FALSE}
library(DT)
library(ggpubr)
library(emba)
library(usefun)
library(nnet)
library(pscl)
library(ComplexHeatmap)
library(circlize)
library(dplyr)
library(tibble)
library(Ckmeans.1d.dp)
library(RColorBrewer)
```

First we load the cell-specific input data:
```{r Cell-specific Input, cache=TRUE}
# Cell Lines
cell.lines = c("A498", "AGS", "DU145", "colo205", "SW620", "SF295", "UACC62", "MDA-MB-468")

cell.line.dirs = sapply(cell.lines, function(cell.line) {
  paste0(getwd(), "/", cell.line)
})

# Model predictions
model.predictions.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/model_predictions")
})

model.predictions.per.cell.line = lapply(model.predictions.files, 
  function(file) {
    get_model_predictions(file)
  }
)

# Observed synergies
observed.synergies.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/observed_synergies")
})

observed.synergies.per.cell.line = lapply(observed.synergies.files, 
  function(file) {
    get_observed_synergies(file)
  }
)

# Models Stable State (1 per model)
models.stable.state.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/models_stable_state")
})

models.stable.state.per.cell.line = lapply(models.stable.state.files,
  function(file) {
    as.matrix(read.table(file, check.names = FALSE))
  }
)

# Models Link Operators
models.link.operator.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/models_link_operator")
})

models.link.operators.per.cell.line = lapply(models.link.operator.files,
  function(file) {
    as.matrix(read.table(file, check.names = FALSE))
  }
)

# the node names used in our analysis
node.names = colnames(models.stable.state.per.cell.line[[1]])

# Steady States
steady.state.files = sapply(cell.line.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/steady_state")
})

steady.state.per.cell.line = lapply(steady.state.files,
  function(file) {
    ss.df = read.table(file, sep = "\t", stringsAsFactors = FALSE)
    steady.state = ss.df[,2]
    names(steady.state) = ss.df[,1]
    
    # change value to NA for nodes for which there was no activity found (dash)
    steady.state[steady.state == "-"] = NA
    
    # keep only the nodes that are included in the analysis
    steady.state = prune_and_reorder_vector(steady.state, node.names)
    
    # check
    stopifnot(names(steady.state) %in% node.names)
    
    # return an integer vector since the activity values are binarized (0,1)
    return(sapply(steady.state, as.integer))
  }
)
```

The random model input data:
```{r Random model Input}
random.dir = paste0(getwd(), "/random")
random.model.predictions = get_model_predictions(paste0(random.dir, "/model_predictions"))

random.models.stable.state = as.matrix(
  read.table(file = paste0(random.dir, "/models_stable_state"), check.names = FALSE)
)

random.models.link.operator =
  as.matrix(read.table(file = paste0(random.dir, "/models_link_operator"), check.names = FALSE))
```

## Model Analysis {-}

In order to find the number of true positive predicted synergies, MCC scores and fitness scores for each of the models in each of the 9 datasets, we use functions from the [emba](https://github.com/bblodfon/emba) R package.

### Cell-specific {-}

We find the MCC, TP and fitness values for each model per cell line (note
that each model's stable state in a specific cell line is matched against the 
steady state from that cell line):
```{r Cell-specific Models TP + MCC + fitness per cell line, cache=TRUE}
models.tp.per.cell.line = list()
models.mcc.per.cell.line = list()
models.fitness.per.cell.line = list()

for (cell.line in cell.lines) {
  model.predictions = model.predictions.per.cell.line[[cell.line]]
  observed.synergies = observed.synergies.per.cell.line[[cell.line]]
  number.of.drug.comb.tested = ncol(model.predictions.per.cell.line[[cell.line]])
  
  # Split model.predictions to positive (observed) and negative (non-observed) results
  observed.model.predictions =
    get_observed_model_predictions(model.predictions, observed.synergies)
  unobserved.model.predictions =
    get_unobserved_model_predictions(model.predictions, observed.synergies)
  
  # Count the predictions of the observed synergies per model (TP)
  models.tp.per.cell.line[[cell.line]] = calculate_models_synergies_tp(observed.model.predictions)
  
  # Calculate Matthews Correlation Coefficient (MCC) for every model
  models.mcc.per.cell.line[[cell.line]] = 
    calculate_models_mcc(observed.model.predictions,
                         unobserved.model.predictions,
                         number.of.drug.comb.tested)
  
  # Fitness per model contrasted to steady state from cell line
  steady.state = steady.state.per.cell.line[[cell.line]]
  models.stable.state = models.stable.state.per.cell.line[[cell.line]]
  models.fitness.per.cell.line[[cell.line]] = apply(
    models.stable.state[,names(steady.state)], 1, get_percentage_of_matches, steady.state
  )
}
```

### Random Models {-}

Finding the MCC, TP and fitness values for each model from the above result (note
that each model's stable state in a specific cell line is matched against the 
steady state from that cell line and that the random models' stable state data
does not change per cell line, i.e. same `random.models.stable.state` object):
```{r Random Models TP + MCC + fitness per cell line, cache=TRUE}
random.models.mcc.per.cell.line = list()
random.models.tp.per.cell.line = list()
random.models.fitness.per.cell.line = list()

for (cell.line in cell.lines) {
  observed.synergies = observed.synergies.per.cell.line[[cell.line]]
  number.of.drug.comb.tested = ncol(random.model.predictions)
  
  # Split model.predictions to positive (observed) and negative (non-observed) results
  observed.model.predictions =
    get_observed_model_predictions(random.model.predictions, observed.synergies)
  unobserved.model.predictions =
    get_unobserved_model_predictions(random.model.predictions, observed.synergies)
  
   # Count the predictions of the observed synergies per model (TP)
  random.models.tp.per.cell.line[[cell.line]] = 
    calculate_models_synergies_tp(observed.model.predictions)
  
  # Calculate Matthews Correlation Coefficient (MCC) for every model
  random.models.mcc.per.cell.line[[cell.line]] = 
    calculate_models_mcc(observed.model.predictions,
                         unobserved.model.predictions,
                         number.of.drug.comb.tested)
  
  # Fitness per model contrasted to steady state from cell line 
  steady.state = steady.state.per.cell.line[[cell.line]]
  random.models.fitness.per.cell.line[[cell.line]] = apply(
    random.models.stable.state[,names(steady.state)], 1, 
    get_percentage_of_matches, steady.state
  )
}
```

## Choose best dataset {-}

We now want to **find the best dataset & cell line** for our subsequent analysis - that is to show the performance vs fitness correlation. 
The argument here is that we want to choose a boolean dataset that has a **large enough fitness value range** along with a large **TP and/or MCC value range**, since with smaller value ranges it would be harder to distinguish the difference of the estimated distributions of the fitness scores belonging to different performance classes (i.e. models' fitnesses that belong to different classification groups with the metric being either the number of TPs or the MCC score).

The next two tables will help us determine exactly which cell line and dataset
to use:
```{r Cell-specific Models TP + MCC + fitness per cell line stats}
cell.specific.model.data = matrix(data = NA, nrow = length(cell.lines), ncol = 11)
rownames(cell.specific.model.data) = cell.lines
colnames(cell.specific.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in cell.lines) {
  models.fitness = models.fitness.per.cell.line[[cell.line]]
  models.mcc = models.mcc.per.cell.line[[cell.line]]
  models.tp = models.tp.per.cell.line[[cell.line]]
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  
  cell.specific.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  cell.specific.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  cell.specific.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  cell.specific.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  cell.specific.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  cell.specific.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  cell.specific.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  cell.specific.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  cell.specific.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(cell.specific.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(cell.specific.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 1: Fitness, MCC scores and TP (True positives) for the Cell-specific model predictions across 8 Cell Lines"
datatable(data = cell.specific.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))
```

```{r Random Models TP + MCC + fitness per cell line stats}
random.model.data = matrix(data = NA, nrow = length(cell.lines), ncol = 11)
rownames(random.model.data) = cell.lines
colnames(random.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in cell.lines) {
  models.fitness = random.models.fitness.per.cell.line[[cell.line]]
  models.mcc = random.models.mcc.per.cell.line[[cell.line]]
  models.tp = random.models.tp.per.cell.line[[cell.line]]
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  
  random.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  random.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  random.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  random.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  random.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  random.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  random.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  random.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  random.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  random.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  random.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(random.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(random.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 2: Fitness, MCC scores and TP (True positives) for the random model predictions across 8 Cell Lines"
datatable(data = random.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))
```

The below box plots compare the MCC values and fitness scores across all cell lines between
the cell-specific models (trained to steady state) and the random ones (trained to proliferation):

```{r Combine data into one data frame}
num.of.models = nrow(random.models.stable.state)
data.list = list()

for (cell.line in cell.lines) {
  cell.line.vec = as.data.frame(rep(cell.line, num.of.models), stringsAsFactors = FALSE)
  models.mcc.cell.specific = remove_rownames(as.data.frame(models.mcc.per.cell.line[[cell.line]]))
  models.mcc.random        = remove_rownames(as.data.frame(random.models.mcc.per.cell.line[[cell.line]]))
  
  models.fitness.cell.specific = remove_rownames(as.data.frame(models.fitness.per.cell.line[[cell.line]]))
  models.fitness.random        = remove_rownames(as.data.frame(random.models.fitness.per.cell.line[[cell.line]]))
  
  data.list[[cell.line]] = bind_cols(cell.line.vec, models.mcc.cell.specific, 
    models.mcc.random, models.fitness.cell.specific, models.fitness.random)
}

data = bind_rows(data.list)
colnames(data) = c("cell.line", "MCC cell-specific", "MCC random", 
                   "fitness cell-specific", "fitness random")
```

```{r Boxplots: MCC and fitness values between cell-specific and random models, fig.width=9, warning=FALSE, cache=TRUE}
# Note the cell-specific models have NaN MCC values
ggboxplot(data, x = "cell.line", y = c("MCC cell-specific", "MCC random"),
          palette = brewer.pal(3, "Set1"), merge = "asis",
          xlab = "Cell Lines", ylab = "MCC values", add = "point", add.params = list(size = 0.5))
ggboxplot(data, x = "cell.line", y = c("fitness cell-specific", "fitness random"),
          palette = brewer.pal(3, "Set1"), merge = "asis",
          xlab = "Cell Lines", ylab = "Fitness values", add = "point", add.params = list(size = 0.5))
```

So, from the two tables and the two box plots above we conclude that:

- In general, the **random models offer a larger range of 
fitness values** when their stable states are matched against the steady state 
of each particular cell line. 
This is something we expected since these models weren't fitted to a 
specific cell-line steady state (meaning that they weren't chosen from the 
`Gitsbe` module as the 3 best from each simulation that match that steady state 
as best as possible) but rather to a more generic state of proliferation. 
Thus, they represent a set of models with larger variation in terms of 
structure (boolean model equations) compared to the cell-specific generated ones.
- **The cell-specific models have always a larger maximum fitness and median value** 
compared to the random ones for each respective cell line.
- In most of the cell lines, **the cell-specific models show an equal or less 
performance (smaller MCC values) than the random ones**

All in all, we will use the **random models data**, contrasted to the steady 
state of the `AGS` cell line (see Table 2). This dataset has the largest 
MCC range and TPR value (third largest fitness range value as well) combined 
in both tables above.

```{r Save best dataset}
best.cell.line = "AGS"

fit = random.models.fitness.per.cell.line[[best.cell.line]]
tp  = random.models.tp.per.cell.line[[best.cell.line]]
mcc = random.models.mcc.per.cell.line[[best.cell.line]]
```

## Statistical Analysis {-}

### Data preprocessing {-}

Firstly, we filter the data by finding the **unique models** - those that have 
strictly different boolean equations. Then, we take a random sample out of these 
(while stabilizing the seed number for reproducibility purposes):
```{r Sample best dataset}
# For reproducibility
set.seed(10)
sample.size = 1000

# find the unique models
unique.models = rownames(unique(random.models.link.operator))

# remove models with NaN MCC scores
unique.models = unique.models[!unique.models %in% names(which(is.na(mcc)))]

# take a sample
unique.models.sample = sample(unique.models, size = sample.size)

fit.unique = fit[names(fit) %in% unique.models.sample]
tp.unique  = tp[names(tp) %in% unique.models.sample]
mcc.unique = mcc[names(mcc) %in% unique.models.sample]

df = as.data.frame(cbind(fit.unique, mcc.unique, tp.unique))
```

Note that the **fitness and MCC score are continuous variables while the number 
of true positives (TP) is discrete**.

### Correlation Plots {-}

Then, we check if the our data is normally distributed (using the Shapiro-Wilk
test for normality and the Q-Q plots):
```{r Normality testing, warning=FALSE, comment="", cache=TRUE}
ggqqplot(data = df, x = "fit.unique", ylab = "Fitness")
ggqqplot(data = df, x = "mcc.unique", ylab = "MCC")

shapiro.test(x = fit.unique)
shapiro.test(x = mcc.unique)
```

From the above results we observe that both the fitness and MCC scores are surely
not normally distributed (with statistical significance).
Thus, we will use **non-parametric correlation scores, namely the Spearman and Kendall rank-based correlation tests**,
to check the correlation between the two continuous variables (models fitness 
values and their corresponding MCC score):
```{r Rank correlation plots: Fitness vs MCC, warning=FALSE}
ggscatter(df, x = "mcc.unique", y = "fit.unique", 
          title = "Fitness vs Performance (MCC) - Spearman Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "spearman", label.x.npc = 0.7, label.y.npc = 0.1),
          xlab = "MCC scores", ylab = "Fitness Values")
ggscatter(df, x = "mcc.unique", y = "fit.unique", 
          title = "Fitness vs Performance (MCC) - Kendall Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall", label.x.npc = 0.7, label.y.npc = 0.1),
          xlab = "MCC scores", ylab = "Fitness Values")
```

From the correlation plots above, we observe a **weak/small positive correlation between
performance and fitness to the steady state**.

To assess the **correlation between the number of TP predictions of the models 
(discrete variable) and the models fitness (continuous variable)**, we construct a 
predictor of the categorical variable from the continuous variable: if the 
resulting classifier has a **high degree of fit** we can conclude the two variables 
share a relationship and are indeed correlated. Since there are more than 2 TP
classes (values) in the dataset, we will use **Multinomial Logistic Regression**
and fit log-linear models via neural networks:

```{r Multinomial Logistic Regression: TP ~ Fitness, comment=""}
model.classifier = multinom(data = df, formula = tp.unique ~ fit.unique)
pseudo.r2.measures = pR2(model.classifier)
pseudo.r2.measures["McFadden"]
```

```{r Test prediction accuracy on same dataset, eval=FALSE, include=FALSE}
library(caret)

predictions = predict(object = model.classifier, newdata = df$tp.unique, type = "class")
postResample(pred = predictions, obs = factor(df$tp.unique))
```

To measure the goodness-of-fit for our model, there are several measures proposed 
for logistic regression. We emphasize on the McFaddens's pseudo-$R^2$ measure 
for our classifier, for which a value of $0.2-0.4$ would indicate an excellent fit [[source](https://stats.stackexchange.com/questions/82105/mcfaddens-pseudo-r2-interpretation)].
Since we found less, we can also assume that there is only a **weak/small positive 
correlation between the number of TPs and the fitness score of the models**.

Next, we will next proceed with a more elaborate analysis, where the models will be split to different 
performance classes (TP or MCC score-derived) and the statistical correlation between the 
individual groups will be tested (with regards to their fitness scores).

### TP-class vs fitness {-}

First, some box and density plots to see practically **what and where is the difference 
between the models fitness values belonging to different TP-classes**:
```{r TP-classification: Plots}
# Box plots
ggboxplot(df, x = "tp.unique", y = "fit.unique", color = "tp.unique",
          palette = usefun:::colors.100[1:length(unique(tp.unique))],
          xlab = "True Positives (TP)", ylab = "Fitness values")

# Density Plots
densities = list()
for (tp.num in sort(unique(tp.unique))) {
  x = df %>%
    filter(tp.unique == tp.num) %>%
    select_at(.vars = c("fit.unique"))
  den = density(x$fit.unique)
  densities[[paste0(tp.num, " (", den$n, ")")]] = den
}

make_multiple_density_plot(densities, legend.title = "TP classes (#models)",
        title = "Density Estimation", x.axis.label = "Fitness score")
```

As we can see from the above plots, **there doesn't seem to be any significant difference 
between the fitness values of the different TP classes**.
Mathematically, we can show that by using the **Kruskal-Wallis Rank Sum Test** 
(notice the p-value - it is larger than $0.001$):

```{r TP-classification: Kruskal Test, comment=""}
# Hypothesis testing: Are the location parameters of the distribution of x the same in each group?
kruskal.test(x = df[,"fit.unique"], g = df[, "tp.unique"])
```

To demonstrate that almost all pair of groups are the same, **we perform pairwise
Wilcoxon rank sum tests and we draw a heatmap of the each test's p-value (note 
that all except one are not statistically significant)**:

```{r TP-classification: Pairwise Wilcox Tests, warning=FALSE}
res = pairwise.wilcox.test(x = df[,"fit.unique"], g = df[, "tp.unique"], p.adjust.method = "BH")
p.value.mat = res$p.value
```

```{r TP-classification: p-value Heatmap of Pairwise Wilcox Tests, warning=FALSE, fig.width=9, dpi=300}
col_fun = colorRamp2(breaks = c(0, 0.05, 0.5, 1), c("green", "white", "orange", "red"))
ht = Heatmap(matrix = p.value.mat, cluster_rows = FALSE, cluster_columns = FALSE,
  na_col = "white", name = "p-value", col = col_fun, column_names_rot = 0,
  row_title = "TP", row_title_rot = 0, row_names_side = "left",
  column_title = "P-values (Pairwise Wilcoxon tests) - TP-classified fitnesses", column_title_gp = gpar(fontsize = 20),
  cell_fun = function(j, i, x, y, width, height, fill) {
    if (!is.na(p.value.mat[i,j]))
        grid.text(sprintf("%.6f", p.value.mat[i, j]), x, y, gp = gpar(fontsize = 16))
})
draw(ht)
```

Note that the **number of true positive predictions is an one-dimensional metric** and by 
excluding the TN, FP and FNs we have a less-informant (and arguably incorrect) 
picture of the models performance classification. 
That's why we will proceed with the more balanced MCC score for classifing the models fitnesses 
to different performance groups.

### MCC-class vs fitness {-}

Firstly, we perform a *univariate k-means clustering* to 
**split the models MCC values to different classes** and plot the data histogram.
The **number of MCC classes** to split the data can be arbitrarily chosen and so
we chose the same as the number of TP classes:

```{r MCC-classification: Find the clusters}
num.of.mcc.classes = 5
mcc.class.ids = 1:num.of.mcc.classes

# find the clusters
res = Ckmeans.1d.dp(x = df[,"mcc.unique"], k = num.of.mcc.classes)
mcc.class.id = res$cluster
df = cbind(df, mcc.class.id)

plot_mcc_classes_hist(df[,"mcc.unique"], df[,"mcc.class.id"],
                      num.of.mcc.classes, mcc.class.ids)
```

Then we show some box and density plots to see practically **what and where is 
the difference between the models fitness values belonging to different MCC-classes**:
```{r MCC-classification: Plots}
# Box plots
ggboxplot(df, x = "mcc.class.id", y = "fit.unique", color = "mcc.class.id",
          palette = usefun:::colors.100[1:num.of.mcc.classes],
          xlab = "MCC class", ylab = "Fitness values")

# Density Plots
densities = list()
for (id in mcc.class.ids) {
  x = df %>%
    filter(mcc.class.id == id) %>%
    select_at(.vars = c("fit.unique"))
  den = density(x$fit.unique)
  densities[[paste0(id, " (", res$size[id], ")")]] = den
}

make_multiple_density_plot(densities, legend.title = "MCC classes (#models)",
        title = "Density Estimation", x.axis.label = "Fitness score")
```

Again we observe that **there doesn’t seem to be any significant difference 
between the fitness values of the different MCC classes**. The next statistical
tests and heatmap show the same:

```{r MCC-classification: Kruskal Test, comment=""}
# Hypothesis testing: Are the location parameters of the distribution of x the same in each group?
kruskal.test(x = df[,"fit.unique"], g = df[, "mcc.class.id"])
```

```{r MCC-classification: Pairwise Wilcox Tests, warning=FALSE}
pair.res = pairwise.wilcox.test(x = df[,"fit.unique"], g = df[, "mcc.class.id"], p.adjust.method = "BH")
p.value.mat = pair.res$p.value
```

```{r MCC-classification: p-value Heatmap of Pairwise Wilcox Tests, warning=FALSE, fig.width=9, dpi=300}
col_fun = colorRamp2(breaks = c(0, 0.05, 0.5, 1), c("green", "white", "orange", "red"))
ht = Heatmap(matrix = p.value.mat, cluster_rows = FALSE, cluster_columns = FALSE,
  na_col = "white", name = "p-value", col = col_fun, column_names_rot = 0,
  row_title = "MCC class id", row_title_rot = 90, row_names_side = "left",
  column_title = "P-values (Pairwise Wilcoxon tests) - MCC-classified fitnesses", column_title_gp = gpar(fontsize = 20),
  cell_fun = function(j, i, x, y, width, height, fill) {
    if (!is.na(p.value.mat[i,j]))
        grid.text(sprintf("%.6f", p.value.mat[i, j]), x, y, gp = gpar(fontsize = 20))
})
draw(ht)
```

## Extended Analysis {-}

In this section, we get new steady states to test the fitness of our model datasets against.
Note that the old ones used were calculated by using both the PARADIGM tool and tools to infer transcription factor activity (in the CLSS workflow). The new ones will use only PARADIGM.
See [README.md](https://github.com/bblodfon/gitsbe-model-analysis/blob/master/cascade/cell-lines-2500/steady-states/README.md) for more info on how to generate the new steady states.

The idea here is that the absense of correlation between fitness and performance shown above had to do with **wrongly inferred steady states**.
So, even though the models have not be trained to these new steady states, their stable states remain the same and new fitness scores can be evaluated against these. Thus, the correlation of the models performance correspoding model fitness in each dataset.

### Input {-}

First, we get the new steady states:
```{r New Steady states}
steady.state.per.cell.line.new = list()
for (cell.line in cell.lines) {
  file = paste0(getwd(), "/steady-states/", cell.line, "_steady_state")
  ss.df = read.table(file, sep = "\t", stringsAsFactors = FALSE)
  
  steady.state = ss.df[,2]
  names(steady.state) = ss.df[,1]
  
  # change value to NA for nodes for which there was no activity found (dash)
  steady.state[steady.state == "-"] = NA
  
  # keep only the nodes that are included in the analysis
  steady.state = prune_and_reorder_vector(steady.state, node.names)
    
  # check
  stopifnot(names(steady.state) %in% node.names)
  
  # return an integer vector since the activity values are binarized (0,1)
  steady.state.per.cell.line.new[[cell.line]] = sapply(steady.state, as.integer)
}

# Which nodes are not in the steady state vector? (genes: xxx_g)
#node.names[which((node.names %in% names(steady.state.per.cell.line.new$SW620) == FALSE))]
```

We check to see if the **new steady states are different from before** by checking the percentage of common states:

```{r Comparing steady states between old and new method, results='asis'}
for (cell.line in cell.lines) {
  percentage.match = get_percentage_of_matches(steady.state.per.cell.line.new[[cell.line]], 
    steady.state.per.cell.line[[cell.line]])
  expres = paste0(cell.line, ": ", specify_decimal(percentage.match, digits.to.keep = 2))
  if (cell.line == cell.lines[1])
    pretty_print_string(string = expres)
  else
    pretty_print_string(string = expres, with.gt = FALSE)
  print_empty_line(html.output = TRUE)
}
```

<div class="green-box">
So, we see that the new steady state vector for each cell line is around half different than the previous calculated one, so pretty much random!
</div>
</br>

We also get the *gold/curation-based* **steady state for the AGS cell line** which was one of the datasets given as supplementary material in the paper [Discovery of Drug Synergies in Gastric Cancer Cells Predicted by Logical Modeling](https://doi.org/10.1371/journal.pcbi.1004426) - see Table S4. Here is a version that is applicable to the node set in the Cascade 2.0 topology (that is used in all our simulations) with the addition of the steady state values for `Prosurvival` and `Antisurvival` nodes:

```{r AGS literature steady state}
ags.literature.file = paste0(getwd(), "/steady-states/AGS_literature_steady_state")
ss.df = read.table(ags.literature.file, sep = "\t", stringsAsFactors = FALSE)
ags.literature.ss = ss.df[,2]
names(ags.literature.ss) = ss.df[,1]
stopifnot(names(ags.literature.ss) %in% node.names)
```

Comparing the common nodes between the AGS steady state vectors found with the 2 previous methods and the *literature-curated* profile in order to see **how much more different the computationally predicted states for the AGS cell line** are from the gold standard, we see that:

```{r AGS gold vs 2 CLSS-calculated AGS steady state vectors, results='asis'}
ags.ss.old = prune_and_reorder_vector(steady.state.per.cell.line$AGS, names(ags.literature.ss))
ags.ss.new = prune_and_reorder_vector(steady.state.per.cell.line.new$AGS, names(ags.literature.ss))

pretty_print_string(paste0("Match between the 2 AGS steady state vectors pruned to the literature-based nodes: ", specify_decimal(get_percentage_of_matches(ags.ss.old, ags.ss.new), digits.to.keep = 2)))
pretty_print_string(paste0("Match between the old AGS steady state vector and the literature-based one: ", specify_decimal(get_percentage_of_matches(ags.ss.old, ags.literature.ss), digits.to.keep = 2)))
pretty_print_string(paste0("Match between the new AGS steady state vector and the literature-based one: ", specify_decimal(get_percentage_of_matches(ags.ss.new, ags.literature.ss), digits.to.keep = 2)))
```

<div class="orange-box">
So, it seems that the 2 computational methods are very different even when compared on the node subset that constitutes the gold standard. Both of them seem to be random compared to the true (curated) steady state predictions.
</div>
</b>

### Models fitness {-}

Given the new steady states, we re-calculate the models fitness for both the cell-specific models and the random ones:
```{r Cell-specific and Random Models new fitness per cell line}
models.fitness.per.cell.line.new = list()
random.models.fitness.per.cell.line.new = list()

for (cell.line in cell.lines) {
  steady.state = steady.state.per.cell.line.new[[cell.line]]
  models.stable.state = models.stable.state.per.cell.line[[cell.line]]
  
  # Fitness per model contrasted to steady state from cell line
  models.fitness.per.cell.line.new[[cell.line]] = apply(
    models.stable.state[,names(steady.state)], 1, get_percentage_of_matches, steady.state
  )
  
  random.models.fitness.per.cell.line.new[[cell.line]] = apply(
    random.models.stable.state[,names(steady.state)], 1, 
    get_percentage_of_matches, steady.state
  )
}
```

We calculate the fitness of the AGS-trained models and the random ones to the **literature-curated steady state for the AGS cell line**:
```{r AGS-specific and Random Models fitness to literature-curated AGS steady state}
ags.models.stable.state = models.stable.state.per.cell.line$AGS

models.fitness.to.ags.literature.ss = apply(
  ags.models.stable.state[,names(ags.literature.ss)], 1, 
  get_percentage_of_matches, ags.literature.ss)
random.models.fitness.to.ags.literature.ss = apply(
  random.models.stable.state[,names(ags.literature.ss)], 1, 
  get_percentage_of_matches, ags.literature.ss)
```

### Choose best dataset {-}

Following the same procedure as before, we try to find the best dataset to do the correlation analysis on. Such dataset will have models that result in a **large MCC + fitness range** when contrasted to the new steady states.

The next two tables will help us determine exactly which cell line and dataset
to use:
```{r Cell-specific Models TP + MCC + fitness per cell line stats with new steady states}
cell.specific.model.data = matrix(data = NA, nrow = length(cell.lines) + 1, ncol = 11)
rownames(cell.specific.model.data) = c(cell.lines, "AGS-gold")
colnames(cell.specific.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in c(cell.lines, "AGS-gold")) {
  if (cell.line == "AGS-gold") {
    models.fitness = models.fitness.to.ags.literature.ss
    models.mcc = models.mcc.per.cell.line$AGS
    models.tp = models.tp.per.cell.line$AGS
  } else {
    models.fitness = models.fitness.per.cell.line.new[[cell.line]]
    models.mcc = models.mcc.per.cell.line[[cell.line]]
    models.tp = models.tp.per.cell.line[[cell.line]]
  }
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  if (cell.line == "AGS-gold") {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line$AGS)
  } else {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  }
  
  cell.specific.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  cell.specific.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  cell.specific.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  cell.specific.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  cell.specific.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  cell.specific.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  cell.specific.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  cell.specific.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  cell.specific.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  cell.specific.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(cell.specific.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(cell.specific.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 3: Fitness, MCC scores and TP (True positives) for the Cell-specific model predictions across 8 Cell Lines"
datatable(data = cell.specific.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))
```

```{r Random Models TP + MCC + fitness per cell line stats with new steady states}
random.model.data = matrix(data = NA, nrow = length(cell.lines) + 1, ncol = 11)
rownames(random.model.data) = c(cell.lines, "AGS-gold")
colnames(random.model.data) = c("fitness range", "Min fitness", 
  "Max fitness", "Mean fitness", "Median fitness", "MCC range", "Min MCC", 
  "Max MCC", "Mean MCC", "Median MCC", "Max TPR")

for (cell.line in c(cell.lines, "AGS-gold")) {
  if (cell.line == "AGS-gold") {
    models.fitness = random.models.fitness.to.ags.literature.ss
    models.mcc = random.models.mcc.per.cell.line$AGS
    models.tp = random.models.tp.per.cell.line$AGS
  } else {
    models.fitness = random.models.fitness.per.cell.line.new[[cell.line]]
    models.mcc = random.models.mcc.per.cell.line[[cell.line]]
    models.tp = random.models.tp.per.cell.line[[cell.line]]
  }
  
  fit.summary = unclass(summary(models.fitness))
  mcc.summary = unclass(summary(models.mcc))
  if (cell.line == "AGS-gold") {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line$AGS)
  } else {
    max.tpr = max(models.tp) / length(observed.synergies.per.cell.line[[cell.line]])
  }
  
  random.model.data[cell.line, "fitness range"] = fit.summary["Max."] - fit.summary["Min."]
  random.model.data[cell.line, "Min fitness"] = fit.summary["Min."]
  random.model.data[cell.line, "Max fitness"] = fit.summary["Max."]
  random.model.data[cell.line, "Mean fitness"] = fit.summary["Mean"]
  random.model.data[cell.line, "Median fitness"] = fit.summary["Median"]
  
  random.model.data[cell.line, "MCC range"] = mcc.summary["Max."] - mcc.summary["Min."]
  random.model.data[cell.line, "Min MCC"] = mcc.summary["Min."]
  random.model.data[cell.line, "Max MCC"] = mcc.summary["Max."]
  random.model.data[cell.line, "Mean MCC"] = mcc.summary["Mean"]
  random.model.data[cell.line, "Median MCC"] = mcc.summary["Median"]
  
  random.model.data[cell.line, "Max TPR"] = max.tpr
}

# color columns
fit.breaks = quantile(random.model.data[,"fitness range"], probs = seq(.05, .95, .05), na.rm = TRUE)
fit.colors = round(seq(255, 40, length.out = length(fit.breaks) + 1), 0) %>%
  {paste0("rgb(255,", ., ",", ., ")")} # red
mcc.breaks = quantile(random.model.data[,"MCC range"], probs = seq(.05, .95, .05), na.rm = TRUE)
mcc.colors = round(seq(255, 40, length.out = length(mcc.breaks) + 1), 0) %>%
  {paste0("rgb(", ., ",255,", ., ")")} # green

caption.title = "Table 4: Fitness, MCC scores and TP (True positives) for the random model predictions across 8 Cell Lines"
datatable(data = random.model.data, 
          options = list(dom = "t"), # just show the table
          caption = htmltools::tags$caption(caption.title, style="color:#dd4814; font-size: 18px")) %>% 
  formatRound(1:11, digits = 3) %>%
  formatStyle(columns = c("fitness range"), backgroundColor = styleInterval(fit.breaks, fit.colors)) %>%
  formatStyle(columns = c("MCC range"), backgroundColor = styleInterval(mcc.breaks, mcc.colors))
```

The below box plots compare the new fitness scores across all cell lines:
```{r Combine fitness data into one data frame}
num.of.models = nrow(random.models.stable.state)
data.list = list()

for (cell.line in c(cell.lines, "AGS-gold")) {
  cell.line.vec = as.data.frame(rep(cell.line, num.of.models), stringsAsFactors = FALSE)
  
  if (cell.line == "AGS-gold") {
    models.fitness.cell.specific = remove_rownames(as.data.frame(models.fitness.to.ags.literature.ss))
    models.fitness.random        = remove_rownames(as.data.frame(random.models.fitness.to.ags.literature.ss))
  } else {
    models.fitness.cell.specific = remove_rownames(as.data.frame(models.fitness.per.cell.line.new[[cell.line]]))
    models.fitness.random        = remove_rownames(as.data.frame(random.models.fitness.per.cell.line.new[[cell.line]]))
  }
  
  colnames(models.fitness.cell.specific) = NULL
  colnames(models.fitness.random) = NULL
  
  data.list[[cell.line]] = 
    bind_cols(cell.line.vec, models.fitness.cell.specific, models.fitness.random)
}

data = bind_rows(data.list)
colnames(data) = c("cell.line", "fitness cell-specific", "fitness random")
```

```{r Boxplots: Fitness values between cell-specific and random models for the new steady states, fig.width=10, cache=TRUE}
ggboxplot(data, x = "cell.line", y = c("fitness cell-specific", "fitness random"),
          palette = brewer.pal(3, "Set1"), merge = "asis",
          xlab = "Cell Lines", ylab = "Fitness values", add = "point", add.params = list(size = 0.5))
```

<div class="green-box">
We observe from Tables 3 and 4 and the boxplots above that all models show a wide range of fitness values since no models were *trained* to the new steady states (it's like all models are random-generated). Also, **the fitness of the AGS-specific and random models to the literature-curated steady state** has the largest possible fitness range and small number of unique fitness values among the models (total of $13$).
</div>
</br>

<div class="orange-box">
We checked the following datasets and found **no statistically significant correlation** between fitness scores and performance metric (TP or MCC):

- `SF295`, `DU145` with cell-specific models (from Table 3)
- `A498`, `UACC62`, `DU145`, `AGS` with random models (from Table 4)
</div>
</br>

**Next, we investigate the AGS-gold dataset**  (row with the higher fitness and MCC range from Table 3). In this dataset the performance (MCC score) of the cell-specific AGS models is associated with the models fitness to the literature-curated AGS steady state.

### AGS-gold Statistical Analysis {-}

Firstly, we:

- Filter the data by finding the **unique models** - those that have strictly different boolean equations
- Remove the unique models with NaN MCC scores
- Merge the fitness scores and MCC values in a common data.frame object

```{r Save best datasets (with new steady states)}
best.cell.line = "AGS"

# AGS-Gold cell-specific data
fit = models.fitness.to.ags.literature.ss
tp  = models.tp.per.cell.line[[best.cell.line]]
mcc = models.mcc.per.cell.line[[best.cell.line]]

unique.models = rownames(unique(models.link.operators.per.cell.line[[best.cell.line]]))
unique.models = unique.models[!unique.models %in% names(which(is.na(mcc)))]

fit.unique = fit[names(fit) %in% unique.models]
tp.unique  = tp[names(tp) %in% unique.models]
mcc.unique = mcc[names(mcc) %in% unique.models]

df = as.data.frame(cbind(fit.unique, mcc.unique, tp.unique))
```

Then, we check if the our data is normally distributed (using the Shapiro-Wilk
test for normality and the Q-Q plots):
```{r Normality testing 2, warning=FALSE, comment="", cache=TRUE}
ggqqplot(data = df, x = "fit.unique", ylab = "Fitness")
ggqqplot(data = df, x = "mcc.unique", ylab = "MCC")

shapiro.test(x = sample(x = df$fit.unique, size = 1000))
shapiro.test(x = sample(x = df$mcc.unique, size = 1000))
```

From the above results we observe that both the fitness and MCC scores in both datasets are surely not normally distributed (with statistical significance).
Thus, we will use **non-parametric correlation scores, namely the Spearman and Kendall rank-based correlation tests**, to check the correlation between the two continuous variables (models fitness values and their corresponding MCC score):
```{r Rank correlation plots: Fitness to curated AGS ss vs MCC, cache=TRUE}
ggscatter(df, x = "fit.unique", y = "mcc.unique", 
          title = "AGS-Gold (Cell-specific): Fitness vs Performance - Spearman Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "spearman"),
          xlab = "Fitness Values", ylab = "MCC scores")
ggscatter(df, x = "fit.unique", y = "mcc.unique", 
          title = "AGS-Gold (Cell-specific): Fitness vs Performance - Kendall Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")
```

#### Fitness vs TP/MCC-class {-}

Split the models to MCC classes:
```{r MCC-classification: Find the clusters 2, cache=TRUE}
num.of.mcc.classes = 5
mcc.class.ids = 1:num.of.mcc.classes

# find the clusters
res = Ckmeans.1d.dp(x = df$mcc.unique, k = num.of.mcc.classes)
mcc.class.id = res$cluster
df = cbind(df, mcc.class.id)

plot_mcc_classes_hist(df$mcc.unique, df$mcc.class.id, 
  num.of.mcc.classes, mcc.class.ids)
```

Box plots to show the difference of the different TP/MCC-classes in terms of fitness score:
```{r Fitness vs TP or MCC-class, warning=FALSE, cache=TRUE}
ggboxplot(df, x = "mcc.class.id", y = "fit.unique", color = "mcc.class.id",
  palette = usefun:::colors.100[1:num.of.mcc.classes],
  xlab = "MCC class", ylab = "Fitness values", 
  add = c("mean", "mean_sd"), error.plot = "errorbar")

ggboxplot(df, x = "tp.unique", y = "fit.unique", color = "tp.unique",
  palette = usefun:::colors.100[1:length(unique(tp.unique))],
  xlab = "TP class", ylab = "Fitness values", 
  add = c("mean", "mean_sd"), remove = 5)
```

#### Fitness-class vs mean MCC {-}

Since the gold-standard AGS steady state has only 23 nodes, our models generate a small number of unique fitness scores to that steady state (total of `r length(unique(df$fit.unique))` unique fitness classes). So, **instead of grouping the models per performance class (TP or MCC) we will group them by fitness class and show the average MCC value in each class**:

```{r Fitness Class vs mean MCC, cache=TRUE}
num.of.fit.classes = length(unique(df$fit.unique))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df$fit.unique, k = num.of.fit.classes)
fit.class.id = res$cluster
df = cbind(df, fit.class.id)

unique.fit.stats = desc_statby(df, measure.var = "mcc.unique", grps = c("fit.unique", "fit.class.id"), ci = 0.95)

ggscatter(unique.fit.stats, x = "fit.unique", y = "mean", 
          title = "Fitness Class vs Performance (MCC) - Kendall Correlation",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall", label.x.npc = 0.7, label.y.npc = 0.1),
          xlab = "Unique Fitness values", ylab = "Mean MCC")
  
ggbarplot(unique.fit.stats, x = "fit.unique", y = "mean", title = "Fitness Class vs Performance (MCC)", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats$mean) + 0.1),
  label = unique.fit.stats$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats$fit.unique, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,10,13)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 7.5, y = 0.5, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")
# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.
```

<div class="green-box">
There is evident statistical correlation between the models fitness to a curation-based steady state and their performance (measured by MCC score). **Models with higher fitness to the steady state tend to have higher MCC score on average**.
</div>
</br>

### AGS-PARADIGM(+TF) analysis {-}

<div class="blue-box">
What if we take the **AGS steady state calculated with the PARADIGM tool** (with or without the Transcription Factor activity) and **prune to the nodes that constitute the AGS-gold steady state**: will there be a correlation between fitness and performance then?
</div>
</br>

The AGS-Gold nodes are:
```{r AGS-Gold nodes, results='asis'}
pretty_print_vector_names(ags.literature.ss)
```

We first find **the fitness of the models to the AGS steady state (Paradigm with and without TF included in the calculation) pruned to the AGS-gold nodes**:
```{r Models fitness to AGS new and old steady states - pruned to AGS literature nodes}
ags.models.stable.state = models.stable.state.per.cell.line$AGS

models.fitness.to.ags.ss.old = apply(
  ags.models.stable.state[,names(ags.ss.old)], 1, 
  get_percentage_of_matches, ags.ss.old)
models.fitness.to.ags.ss.new = apply(
  ags.models.stable.state[,names(ags.ss.new)], 1, 
  get_percentage_of_matches, ags.ss.new)

fit.old = models.fitness.to.ags.ss.old
fit.new = models.fitness.to.ags.ss.new
fit.unique.old = fit.old[names(fit.old) %in% unique.models]
fit.unique.new = fit.new[names(fit.new) %in% unique.models]

df.old = as.data.frame(cbind(fit.unique.old, mcc.unique, tp.unique))
df.new = as.data.frame(cbind(fit.unique.new, mcc.unique, tp.unique))
```

Then, we see if there is any correlation between fitness and performance (MCC) using general scatter plots:
```{r Rank correlation plots: Fitness to Paradigm-calculated ss (with and without TF) - pruned to AGS-gold nodes vs MCC, cache=TRUE}
ggscatter(df.old, x = "fit.unique.old", y = "mcc.unique", 
          title = "AGS (Paradigm+TF) - Cell-specific: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")
ggscatter(df.new, x = "fit.unique.new", y = "mcc.unique", 
          title = "AGS (Paradigm) - Cell-specific: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")
```

<div class="orange-box">
There is almost no correlation between fitness and performance when the fitness of the models is calculated with the **same nodes as in AGS-gold steady state**, but **with the values as they were calculated by PARADIGM or PARADIGM including TF activity**. This proves that the best models are those that fit to the *true* steady state and not the wrongly assessed ones!
</div>
</br>

To better visualize the absense of correlation, **we group the models to distinct fitness classes and show the average MCC per class with the corresponding 95% CI**. 

First for the **PARADIGM+TF (old)** calculated steady state for AGS, pruned to the AGS-gold nodes:

```{r Fitness Class vs mean MCC (Paradigm+TF - pruned to AGS-gold), warning=FALSE, cache=TRUE}
num.of.fit.classes = length(unique(df.old$fit.unique.old))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.old$fit.unique.old, k = num.of.fit.classes)
fit.class.id = res$cluster
df.old = cbind(df.old, fit.class.id)

unique.fit.stats.old = desc_statby(df.old, measure.var = "mcc.unique", grps = c("fit.unique.old", "fit.class.id"), ci = 0.95)
  
ggbarplot(unique.fit.stats.old, x = "fit.unique.old", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm + TF", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.old$mean) + 0.1),
  label = unique.fit.stats.old$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.old$fit.unique.old, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 4.5, y = 0.38, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")
# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.
```

And secondly for **the PARADIGM only (new)** calculated steady state for AGS, pruned to the AGS-gold nodes:

```{r Fitness Class vs mean MCC (Paradigm only - pruned to AGS-gold), warning=FALSE, cache=TRUE}
num.of.fit.classes = length(unique(df.new$fit.unique.new))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.new$fit.unique.new, k = num.of.fit.classes)
fit.class.id = res$cluster
df.new = cbind(df.new, fit.class.id)

unique.fit.stats.new = desc_statby(df.new, measure.var = "mcc.unique", grps = c("fit.unique.new", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.new, x = "fit.unique.new", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm only", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.new$mean) + 0.1),
  label = unique.fit.stats.new$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.new$fit.unique.new, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 4.5, y = 0.35, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")
# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.
```

### ASG-Gold Random analysis

<div class="blue-box">
In this section we want to investigate how the **random models performance** correlates with the **fitness to the AGS-gold steady state**, as well as the **pruned PARADIGM steady state (with or with TF)** - pruned to the 23 nodes of the AGS-gold ss.
</div>
</br>

First, we calculate the new fitness values for the random models, for each steady state case (AGS-Gold, PARADIGM only (new), PARADIGM+TF (old)):
```{r Fitness and MCC scores for Random Models, cache=TRUE}
# Get the MCC scores of the random models for the AGS
mcc = random.models.mcc.per.cell.line$AGS

# find the unique models
unique.models = rownames(unique(random.models.link.operator))

# remove models with NaN MCC scores
unique.models = unique.models[!unique.models %in% names(which(is.na(mcc)))]

# Prune MCC scores to unique models
mcc.unique = mcc[names(mcc) %in% unique.models]

# Find fitness of the random models to the 3 steady states
# random.models.fitness.to.ags.literature.ss (already calculated for AGS-Gold)
random.models.fitness.to.ags.ss.old = apply(
  random.models.stable.state[,names(ags.ss.old)], 1, 
  get_percentage_of_matches, ags.ss.old)
random.models.fitness.to.ags.ss.new = apply(
  random.models.stable.state[,names(ags.ss.new)], 1, 
  get_percentage_of_matches, ags.ss.new)

fit.gold = random.models.fitness.to.ags.literature.ss
fit.old = random.models.fitness.to.ags.ss.old
fit.new = random.models.fitness.to.ags.ss.new

fit.unique.gold = fit.gold[names(fit.gold) %in% unique.models]
fit.unique.old = fit.old[names(fit.old) %in% unique.models]
fit.unique.new = fit.new[names(fit.new) %in% unique.models]

df.gold.random = as.data.frame(cbind(fit.unique.gold, mcc.unique))
df.old.random = as.data.frame(cbind(fit.unique.old, mcc.unique))
df.new.random = as.data.frame(cbind(fit.unique.new, mcc.unique))
```

Then, we check to see if there is any correlation between fitness and performance (MCC) using general scatter plots:
```{r Rank correlation plots for Random Models, cache=TRUE}
ggscatter(df.gold.random, x = "fit.unique.gold", y = "mcc.unique", 
          title = "AGS-Gold - Random: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")
ggscatter(df.old.random, x = "fit.unique.old", y = "mcc.unique", 
          title = "AGS (Paradigm+TF) - Random: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")
ggscatter(df.new.random, x = "fit.unique.new", y = "mcc.unique", 
          title = "AGS (Paradigm) - Random: Fitness vs Performance - Kendall's tau",
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.coeff.args = list(method = "kendall"),
          xlab = "Fitness Values", ylab = "MCC scores")
```

<div class="orange-box">
There is very small correlation between fitness and performance in the case of fitness to the **AGS-Gold steady state** and the one that is calculated with the **same nodes as in AGS-gold steady state**, but **with the values as they were calculated by PARADIGM including TF activity**.
</div>
</br>

To better visualize the correlation relationship (if there is any), **we group the models to distinct fitness classes and show the average MCC per class with the corresponding 95% CI**. 

First for the **AGS-Gold** steady state:

```{r Fitness Class vs mean MCC for Random Models (AGS-Gold), warning=FALSE, cache=TRUE}
num.of.fit.classes = length(unique(df.gold.random$fit.unique.gold))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.gold.random$fit.unique.gold, k = num.of.fit.classes)
fit.class.id = res$cluster
df.gold.random = cbind(df.gold.random, fit.class.id)

unique.fit.stats.gold = desc_statby(df.gold.random, measure.var = "mcc.unique", grps = c("fit.unique.gold", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.gold, x = "fit.unique.gold", y = "mean", title = "Fitness Class vs Performance (MCC) - AGS-Gold - Random", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.gold$mean) + 0.1),
  label = unique.fit.stats.gold$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.gold$fit.unique.gold, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,10)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 5.5, y = 0.48, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")
# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.
```

Secondly, for the **PARADIGM+TF (old)** calculated steady state for AGS, pruned to the AGS-gold nodes:

```{r Fitness Class vs mean MCC for Random Models (Paradigm+TF - pruned to AGS-gold), warning=FALSE, cache=TRUE}
num.of.fit.classes = length(unique(df.old.random$fit.unique.old))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.old.random$fit.unique.old, k = num.of.fit.classes)
fit.class.id = res$cluster
df.old.random = cbind(df.old.random, fit.class.id)

unique.fit.stats.old = desc_statby(df.old.random, measure.var = "mcc.unique", grps = c("fit.unique.old", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.old, x = "fit.unique.old", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm+TF - Random", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.old$mean) + 0.1),
  label = unique.fit.stats.old$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.old$fit.unique.old, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,9)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 5.5, y = 0.48, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")
# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.
```

Lastly, for the **PARADIGM (new)** calculated steady state for AGS, pruned to the AGS-gold nodes:

```{r Fitness Class vs mean MCC for Random models (Paradigm - pruned to AGS-gold), warning=FALSE, cache=TRUE}
num.of.fit.classes = length(unique(df.new.random$fit.unique.new))
fit.class.ids = 1:num.of.fit.classes

# find the fitness classes
res = Ckmeans.1d.dp(x = df.new.random$fit.unique.new, k = num.of.fit.classes)
fit.class.id = res$cluster
df.new.random = cbind(df.new.random, fit.class.id)

unique.fit.stats.new = desc_statby(df.new.random, measure.var = "mcc.unique", grps = c("fit.unique.new", "fit.class.id"), ci = 0.95)

ggbarplot(unique.fit.stats.new, x = "fit.unique.new", y = "mean", title = "Fitness Class vs Performance (MCC) - Paradigm - Random", fill = "fit.class.id",
  xlab = "Fitness Score", ylab = "Mean MCC", ylim = c(0, max(unique.fit.stats.new$mean) + 0.1),
  label = unique.fit.stats.new$length, lab.vjust = -2) + 
  scale_x_discrete(labels = round(unique.fit.stats.new$fit.unique.new, digits = 2)) + 
  scale_fill_gradient(low = "lightblue", high="red", name = "Fitness Class", guide = "legend", breaks = c(1,4,7,10)) +
  geom_errorbar(aes(ymin=mean-2*se, ymax=mean+2*se), width = 0.2, position = position_dodge(0.9)) +
  annotate("text", x = 5.5, y = 0.48, label = "Number of models per fitness class are displayed above 95% CI bars for the mean MCC")
# mean -+ 2*se => 95% CI for mean. Use when you are interested in the precision of the means or in comparing and testing differences between means.
```

## R session info {-}

```{r session info, comment=""}
xfun::session_info()
```
