Parameterization vs Performance
Best ROC and PRC
In this section we will compare the normalized combined predictors (\(calibrated + \beta \times random, \beta=-1\)) across all 3 model parameterizations/mutations we tested in this report for CASCADE 2.0: link operator mutations, topology mutations and both. We use the normalization parameter \(\beta=-1\) for all combined predictors, as it was observed throughout the report that it approximately maximizes the performance of all Bliss-assessed, ensemble-wise combined synergy predictors in terms of ROC and PR AUC. The results are from the \(150\) simulation runs (\(450\) models).
Why call \(\beta\) a normalization parameter?
What matters for the calculation of the ROC and PR points is the ranking of the synergy scores. Thus if we bring the predictor’s synergy scores to the exponential space, a value of \(-1\) for \(\beta\) translates to a simple fold-change normalization technique:
\(calibrated + \beta \times random \overset{\beta = -1}{=} calibrated - random \xrightarrow[\text{same ranking}]{e(x) \text{ monotonous}}\) \(e^{(calibrated - random)}=e^{calibrated}/e^{random}\).
# Link operator mutations results (`best_score2` has the results for β = -1, `best_score1` for β = -1.6)
= get_roc_stats(df = pred_ew_bliss, pred_col = "best_score2", label_col = "observed")
roc_link_res = pr.curve(scores.class0 = pred_ew_bliss %>% pull(best_score2) %>% (function(x) {-x}),
pr_link_res weights.class0 = pred_ew_bliss %>% pull(observed), curve = TRUE, rand.compute = TRUE)
# Topology mutations results
= get_roc_stats(df = pred_topo_ew_bliss, pred_col = "best_score", label_col = "observed")
roc_topo_res = pr.curve(scores.class0 = pred_topo_ew_bliss %>% pull(best_score) %>% (function(x) {-x}),
pr_topo_res weights.class0 = pred_topo_ew_bliss %>% pull(observed), curve = TRUE)
# Both Link Operator and Topology mutations results
= get_roc_stats(df = pred_topolink_ew_bliss, pred_col = "best_score", label_col = "observed")
roc_topolink_res = pr.curve(scores.class0 = pred_topolink_ew_bliss %>% pull(best_score) %>% (function(x) {-x}),
pr_topolink_res weights.class0 = pred_topolink_ew_bliss %>% pull(observed), curve = TRUE)
# Plot best ROCs
plot(x = roc_link_res$roc_stats$FPR, y = roc_link_res$roc_stats$TPR,
type = 'l', lwd = 3, col = my_palette[1], main = TeX('ROC curves (Ensemble-wise), $calibrated + \\beta \\times random$'),
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
lines(x = roc_topo_res$roc_stats$FPR, y = roc_topo_res$roc_stats$TPR,
lwd = 2, col = my_palette[2])
lines(x = roc_topolink_res$roc_stats$FPR, y = roc_topolink_res$roc_stats$TPR,
lwd = 2.3, col = my_palette[3])
legend('bottomright', title = TeX('AUC ($\\beta$ = -1): Mutations'),
col = c(my_palette[1:3]), pch = 19, cex = 1.5,
legend = c(paste(round(roc_link_res$AUC, digits = 2), 'Link Operator'),
paste(round(roc_topo_res$AUC, digits = 2), 'Topology'),
paste(round(roc_topolink_res$AUC, digits = 2), 'Both')))
grid(lwd = 0.5)
abline(a = 0, b = 1, col = 'lightgrey', lty = 'dotdash', lwd = 1.2)
# Plot best PRCs
plot(pr_link_res, main = TeX('PR curves (Ensemble-wise), $calibrated + \\beta \\times random$'),
auc.main = FALSE, color = my_palette[1], rand.plot = TRUE, lwd = 3)
plot(pr_topo_res, add = TRUE, color = my_palette[2], lwd = 2)
plot(pr_topolink_res, add = TRUE, color = my_palette[3], lwd = 2.3)
legend('topright', title = TeX('AUC ($\\beta$ = -1): Mutations'),
col = c(my_palette[1:3]), pch = 19, cex = 1.3,
legend = c(paste(round(pr_link_res$auc.davis.goadrich, digits = 2), 'Link Operator'),
paste(round(pr_topo_res$auc.davis.goadrich, digits = 2), 'Topology'),
paste(round(pr_topolink_res$auc.davis.goadrich, digits = 2), ' Both')))
grid(lwd = 0.5)
We observe that if we had used the results for the link operator only combined predictor with \(\beta_{best}=-1.6\) as was demonstrated here, we would have an AUC-ROC of \(0.85\) and AUC-PR of \(0.27\), which are pretty close to the results we see above for \(\beta=-1\), using both link and topology mutations.
Overall, this suggests that to parameterize our boolean models using topology mutations can increase the performance of our proposed synergy prediction approach much more than using either link operator (balance) mutations alone or combined with topology parameterization.
Note that the difference in terms of ROC AUC is not significant compared to the difference of PR AUC scores and since the dataset we test our models on is fairly imbalanced, we base our conclusion on the information from the PR plots (Saito and Rehmsmeier 2015).
Bootstrap Simulations
Now we would like to statistically verify the previous conclusion (that topology is superior to the other two parameterization schemes and produces better predictive models for our dataset) and so we will run a bootstrap analysis.
Simply put, we generate 3 large pools of calibrated to steady state models (\(4500\) models each). Each pool corresponds to the 3 parameterization schemes (i.e. it has models with either topology mutations only, link-operator mutations only, or models with both mutations). Then, we take several model samples from each pool (\(25\) samples, each sample containing \(300\) models) and run the drug response simulations for these calibrated model ensembles to get their predictions. Normalizing each calibrated simulation prediction output to the corresponding random (proliferative) model predictions (using a \(\beta=-1\) as above), results in different ROC and PR AUCs for each parameterization scheme and chosen bootstrapped sample.
See more details on reproducing the results on section Parameterization Bootstrap.
Load the already-stored result:
= readRDS(file = "data/res_param_boot_aucs.rds") res
Compare all 3 schemes
# define group comparisons for statistics
= list(c("link-only","topology-only"), c("link-only","topo-and-link"),
my_comparisons c("topology-only","topo-and-link"))
# ROC AUCs
ggboxplot(data = res, x = "param", y = "roc_auc",
fill = "param", add = "jitter", palette = "Set1",
xlab = "Parameterization", ylab = "ROC AUC",
title = "Parameterization vs Performance (ROC)") +
stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position = "none")
# PR AUCs
ggboxplot(data = res, x = "param", y = "pr_auc",
fill = "param", add = "jitter", palette = "Set1",
xlab = "Parameterization", ylab = "PR AUC",
title = "Parameterization vs Performance (Precision-Recall)") +
stat_compare_means(comparisons = my_comparisons, method = "wilcox.test", label = "p.format") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position = "none")
The topology mutations generate the best performing models in terms of PR AUC with statistical significance compared to the other two groups of model ensembles using different parameterization. In terms of ROC AUC performance we also note the larger variance of the topology mutated models.
Compare Topology vs Link-operator Parameterization
# load the data
= readRDS(file = "data/res_param_boot_aucs.rds")
res
# filter data
= res %>%
res filter(param != "topo-and-link")
= list(c("link-only","topology-only"))
param_comp
= res %>%
stat_test_roc ::wilcox_test(formula = roc_auc ~ param, comparisons = param_comp) %>%
rstatix::add_significance("p")
rstatix
# ROC AUCs
ggboxplot(res, x = "param", y = "roc_auc", fill = "param", palette = "Set1",
add = "jitter", xlab = "", ylab = "ROC AUC") +
scale_x_discrete(breaks = c("link-only","topology-only"),
labels = c("Parameterization Mutations", "Topology Mutations")) +
::stat_pvalue_manual(stat_test_roc, label = "p = {p} ({p.signif})", y.position = c(1)) +
ggpubrylim(c(0.2,1)) +
theme_classic(base_size = 14) +
theme(plot.title = element_text(hjust = 0.5), axis.text = element_text(size = 16)) +
geom_hline(yintercept = 0.5, linetype = 'dashed', color = "red") +
geom_text(aes(x = 2.1, y = 0.45, label="Random Predictions (AUC = 0.5)", size = 5)) +
theme(legend.position = "none")
= res %>%
stat_test_pr ::wilcox_test(formula = pr_auc ~ param, comparisons = param_comp) %>%
rstatix::add_significance("p") %>%
rstatix::add_y_position()
rstatix
# PR AUCs
ggboxplot(res, x = "param", y = "pr_auc", fill = "param", palette = "Set1",
add = "jitter", xlab = "", ylab = "Precision-Recall AUC") +
scale_x_discrete(breaks = c("link-only","topology-only"),
labels = c("Parameterization Mutations", "Topology Mutations")) +
::stat_pvalue_manual(stat_test_pr, label = "p = {p} ({p.signif})") +
ggpubrtheme_classic(base_size = 14) +
theme(plot.title = element_text(hjust = 0.5), axis.text = element_text(size = 16)) +
geom_hline(yintercept = 6/153, linetype = 'dashed', color = "red") +
geom_text(aes(x = 1.9, y = 0.08, label="Random Predictions (AUC = 0.04)"), size = 5) +
theme(legend.position = "none")