The purpose of this analysis is to find synergy biomarkers using the emba R package in the cascade boolean model datasets (Cascade is the name of the topology used). Particularly, we will investigate \(4\) synergies: AK-PD
, PD-PI
, PD-G2
, PI-D1
and try to find causal mechanisms that might explain why and where (pathways) these synergies manifest.
Note that AK
is an AKT inhibitor and PI
is a PI3K inhibitor and they both target the PI3K/AKT/mTOR pathway. The combination of such an inhibitor with a MEK inhibitor - PD
- that targets the MAPK/ERK pathway, has proven to be more effective than the single drug treatment during clinical trials with patients that had advanced colorectal carcinoma (source). The G2
drug is a PDPK1 inhibitor and the D1
drug is an inhibitor of RSK
isoforms (RPS6KA1, RPS6KA3, RPS6KA2, RPS6KA6).
The boolean model datasets are in total \(9\): one for each cell line of interest (8 cell lines) where the models were fitted to a specific steady state in each case and one for the so-called random models which were generated randomly in the sense that were fitted only to a proliferation state (simulations were done using the DrugLogics software modules Gitsbe
and Drabme
).
The results will focus on the activity state biomarkers and not the link operator ones (since we didn’t know how to best interpret them - though an effort has been made in the AK-PD analysis).
ERK_f
overexpression (cell-specific models and random models)SRC
inhibitionERK_f
overexpression (random models)PTEN
inhibition - in cell-specific models (with ERK_f
overexpression)SOS1
inhibition in all cell lines except A498
and SW620
ERK_f
overexpression (random models)PDPK1
overexpressionPTEN
inhibition (see here and here)PIK3CA
and PTEN
share an OR NOT
relationship as regulators of PDPK1
SOS1
inhibition in all cell lines except A498
and SW620
ERK_f
overexpressionCDC42
inhibitionLoading libraries:
library(emba)
library(usefun)
library(ComplexHeatmap)
library(circlize)
library(dplyr)
library(tibble)
library(DT)
library(ggpubr)
library(Ckmeans.1d.dp)
We load the cell-specific input data:
# Cell Lines
cell.lines = c("A498", "AGS", "DU145", "colo205", "SW620", "SF295", "UACC62", "MDA-MB-468")
cell.line.dirs = sapply(cell.lines, function(cell.line) {
paste0(getwd(), "/", cell.line)
})
# Model predictions
model.predictions.files = sapply(cell.line.dirs, function(cell.line.dir) {
paste0(cell.line.dir, "/model_predictions")
})
model.predictions.per.cell.line = lapply(model.predictions.files, function(file) {
get_model_predictions(file)
})
# Observed synergies
observed.synergies.files = sapply(cell.line.dirs, function(cell.line.dir) {
paste0(cell.line.dir, "/observed_synergies")
})
observed.synergies.per.cell.line = lapply(observed.synergies.files, function(file) {
get_observed_synergies(file)
})
# Models Stable State (1 per model)
models.stable.state.files = sapply(cell.line.dirs, function(cell.line.dir) {
paste0(cell.line.dir, "/models_stable_state")
})
models.stable.state.per.cell.line = lapply(models.stable.state.files, function(file) {
as.matrix(read.table(file, check.names = FALSE))
})
# Models Link Operators
models.link.operator.files = sapply(cell.line.dirs, function(cell.line.dir) {
paste0(cell.line.dir, "/models_link_operator")
})
models.link.operators.per.cell.line = lapply(models.link.operator.files, function(file) {
as.matrix(read.table(file, check.names = FALSE))
})
The random model input data:
random.dir = paste0(getwd(), "/random")
random.model.predictions = get_model_predictions(paste0(random.dir, "/model_predictions"))
random.models.stable.state = as.matrix(
read.table(file = paste0(random.dir, "/models_stable_state"), check.names = FALSE)
)
random.models.link.operator =
as.matrix(read.table(file = paste0(random.dir, "/models_link_operator"), check.names = FALSE))
# the node names used in our analysis
node.names = colnames(random.models.stable.state)
# the tested drug combinations
drug.combos = colnames(random.model.predictions)
Using the generic function biomarker_synergy_analysis
from the emba R package, we can find synergy biomarkers i.e. nodes whose activity and boolean equation parameterization (link operator) affect the manifestation of synergies in the boolean models. Models are classified based on whether they predict or not each of the predicted synergies found in each boolean dataset.
First we run the analysis on the cell-specific boolean model datasets (note that every input to the biomarker_synergy_analysis
function changes per cell line):
cell.specific.synergy.analysis.res = list()
for (cell.line in cell.lines) {
cell.specific.synergy.analysis.res[[cell.line]] =
biomarker_synergy_analysis(model.predictions.per.cell.line[[cell.line]],
models.stable.state.per.cell.line[[cell.line]],
models.link.operators.per.cell.line[[cell.line]],
observed.synergies.per.cell.line[[cell.line]], threshold = 0.7)
}
Next we run the analysis on the random boolean model datasets (note that the only input to the biomarker_synergy_analysis
function that changes is the observed synergies per cell line - the rest is the same data from the random model dataset):
# Synergy Biomarkers for cell proliferation models
random.synergy.analysis.res = list()
for (cell.line in cell.lines) {
random.synergy.analysis.res[[cell.line]] =
biomarker_synergy_analysis(random.model.predictions,
random.models.stable.state, random.models.link.operator,
observed.synergies.per.cell.line[[cell.line]], threshold = 0.7)
}
The results from all the previous code chunks have already been executed and the result data is saved in the cascade_synergy_biomarkers.RData
file. We load all the objects:
#save.image(file = "cascade_synergy_biomarkers.RData")
load(file = "cascade_synergy_biomarkers.RData")
Each of the cell lines studied has a different set of observed synergies (drug combinations that were found synergistic across all the 153 tested ones). In this section, we will visualize the cell lines’ observed synergies and mark the synergies that were also predicted by the cell-specific models and the random-generated ones (in one cell line at least). First, we get the biomarkers for these synergies from each cell line:
total.predicted.synergies.cell.specific =
unique(unlist(sapply(cell.specific.synergy.analysis.res, function(x) { x$predicted.synergies})))
total.predicted.synergies.cell.specific.num = length(total.predicted.synergies.cell.specific)
The same for the random models:
total.predicted.synergies.random =
unique(unlist(sapply(random.synergy.analysis.res, function(x) { x$predicted.synergies})))
total.predicted.synergies.random.num = length(total.predicted.synergies.random)
Then, we get the observed synergies from each cell line in a data.frame
:
observed.synergies.res = get_observed_synergies_per_cell_line(cell.line.dirs, drug.combos)
# remove drug combinations which are not observed in any of the cell lines
observed.synergies.res = prune_columns_from_df(observed.synergies.res, value = 0)
total.observed.synergies = colnames(observed.synergies.res)
total.observed.synergies.num = length(total.observed.synergies)
Lastly, we visualize the observed and predicted synergies for all cell lines in one heatmap:
# color the cell-specific predicted synergies
predicted.synergies.colors = rep("black", total.observed.synergies.num)
names(predicted.synergies.colors) = total.observed.synergies
common.predicted.synergies = intersect(total.predicted.synergies.cell.specific,
total.predicted.synergies.random)
cell.specific.only.predicted.synergies =
total.predicted.synergies.cell.specific[!total.predicted.synergies.cell.specific %in% total.predicted.synergies.random]
random.only.predicted.synergies =
total.predicted.synergies.random[!total.predicted.synergies.random %in% total.predicted.synergies.cell.specific]
predicted.synergies.colors[total.observed.synergies %in%
common.predicted.synergies] = "blue"
predicted.synergies.colors[total.observed.synergies %in%
cell.specific.only.predicted.synergies] = "orange"
predicted.synergies.colors[total.observed.synergies %in%
random.only.predicted.synergies] = "purple"
# define a coloring
obs.synergies.col.fun = colorRamp2(c(0, 1), c("red", "green"))
observed.synergies.heatmap =
Heatmap(matrix = as.matrix(observed.synergies.res),
col = obs.synergies.col.fun,
column_title = "Observed synergies per cell line",
column_title_gp = gpar(fontsize = 20),
column_names_gp = gpar(col = predicted.synergies.colors),
row_title = "Cell Lines", row_title_side = "left",
row_dend_side = "right", row_names_side = "left",
rect_gp = gpar(col = "black", lwd = 0.3),
heatmap_legend_param = list(at = c(1, 0), labels = c("YES", "NO"),
color_bar = "discrete", title = "Observed", direction = "vertical"))
lgd = Legend(at = c("Cell-specific", "Random", "Both"), title = "Predicted",
legend_gp = gpar(fill = c("orange", "purple", "blue")))
draw(observed.synergies.heatmap, heatmap_legend_list = list(lgd),
heatmap_legend_side = "right")
AK-BI
, PI-D1
)AK-G4
and 5Z-D1
are observed synergies that only the cell-specific models could predict, whereas the G2-P5
and PI-D4
are observed synergies that only the random models could predict. This shows us that a complimentary approach is needed when searching for biomarkers as the two different kind of models (trained to a specific activity state profile vs trained to proliferation) although they share common true positives regarding the synergies they predict, there are also synergies only a specific class of models could predict.AK-PD
, PD-PI
and PD-G2
were observed in the A498
cell line and predicted by both the cell-specific and random models.As observed above, the AK-PD
synergy was predicted by both the cell specific and random models in the A498
cell line.
We get the average state and link operator differences per network node for the A498
cell line from the cell-specific models:
AK.PD.avg.state.diff.cell.specific = cell.specific.synergy.analysis.res$A498$diff.state.synergies.mat["AK-PD", ]
AK.PD.avg.link.diff.cell.specific = cell.specific.synergy.analysis.res$A498$diff.link.synergies.mat["AK-PD", ]
We build the network from the topology file:
topology.file = paste0(getwd(), "/topology")
net = construct_network(topology.file = topology.file, models.dir = paste0(getwd(), "/AGS/models"))
# a static layout for plotting the same network always
coordinates.file = paste0(getwd(), "/network_xy_coordinates")
nice.layout = as.matrix(read.table(coordinates.file))
We will now visualize the nodes average state differences in a network graph. Note that the good models are those that predicted the AK-PD
drug combination to be synergistic and were contrasted to those that predicted it to be antagonistic (bad models). The number of models in each category were:
model.predictions = model.predictions.per.cell.line[["A498"]]
models.stable.state = models.stable.state.per.cell.line[["A498"]]
drug.comb = "AK-PD"
good.models.num = sum(model.predictions[, drug.comb] == 1 & !is.na(model.predictions[, drug.comb]))
# unique good models
# models.link.operator = models.link.operators.per.cell.line$A498
# nrow(unique(models.link.operator[model.predictions[, drug.comb] == 1 & !is.na(model.predictions[, drug.comb]), ]))
bad.models.num = sum(model.predictions[, drug.comb] == 0 & !is.na(model.predictions[, drug.comb]))
pretty_print_string(paste0("Number of 'good' models (AK-PD synergistic) in the A498 cell line: ", good.models.num))
Number of ‘good’ models (AK-PD synergistic) in the A498 cell line: 26
pretty_print_string(paste0("Number of 'bad' models (AK-PD antagonistic) in the A498 cell line: ", bad.models.num))
Number of ‘bad’ models (AK-PD antagonistic) in the A498 cell line: 2672
plot_avg_state_diff_graph(net, diff = AK.PD.avg.state.diff.cell.specific,
layout = nice.layout, title = "AK-PD activity state biomarkers (Cell specific models - A498)")
Thus, we can identify the active state biomarkers:
AK.PD.active.biomarkers = AK.PD.avg.state.diff.cell.specific[AK.PD.avg.state.diff.cell.specific > 0.7]
pretty_print_vector_names(AK.PD.active.biomarkers)
1 node: ERK_f
So, the AK-PD
synergy manifests in cancer cell models that have the ERK_f
family logical node in an active state. The MAPK-ERK signaling pathway has been studied a lot and has been found to be overexpressed/have increased activity in cancers and as such cancer treatments that include the inhibition of that pathway are found to be most beneficial.
Paper evidence for ERK
overexpression in cancer (there are many):
The inhibited state biomarkers are:
AK.PD.inhibited.biomarkers = AK.PD.avg.state.diff.cell.specific[AK.PD.avg.state.diff.cell.specific < -0.7]
pretty_print_vector_names(AK.PD.inhibited.biomarkers)
2 nodes: PTPN11, GAB_f
We will demonstrate that the PTPN11
and GAB_f
inhibited biomarkers are a direct consequence of the overexpression of ERK_f
:
If we check the logical equations related to the above biomarkers we see that:
pretty_print_string("ERK_f *= ( MEK_f ) AND/OR NOT ( ( DUSP6 ) or PPP1CA )")
ERK_f *= ( MEK_f ) AND/OR NOT ( ( DUSP6 ) or PPP1CA )
pretty_print_string("GAB_f *= ( GRB2 ) AND/OR NOT ( ERK_f )")
GAB_f *= ( GRB2 ) AND/OR NOT ( ERK_f )
pretty_print_string("PTPN11 *= ( GAB_f )")
PTPN11 *= ( GAB_f )
So, pretty much if the GAB_f
node is more inhibited in the models that predicted the AK-PD
synergy (good models), then PTPN11
is also as well. Also the average activity difference of the GRB2
node is -0.3434189, which makes the GAB_f
node more inhibited in the good models since it’s activity is mostly dependent on the ERK_f
node, which is mostly overexpressed in the good models. All in all, the overexpression of ERK_f
is what causes the two other inhibited biomarkers.
We visualize the nodes average link operator differences in a network graph:
plot_avg_link_operator_diff_graph(net, diff = AK.PD.avg.link.diff.cell.specific,
layout = nice.layout, title = "AK-PD link operator biomarkers (Cell specific models - A498)")
So, the models that predicted the AK-PD
synergy, had the OR NOT as a link operator in the boolean equation that has the ERK_f
node as target, i.e. ERK_f *= ( MEK_f ) OR NOT ( ( DUSP6 ) or PPP1CA )
instead of ERK_f *= ( MEK_f ) AND NOT ( ( DUSP6 ) or PPP1CA )
. The difference in the result of the logical equation can be seen in the next two truth tables where the use of the OR NOT link operator makes the end truth value more flexible in the sense that a lot more more TRUE values are possible, since the activating regulator MEK_f
(which is the PD
drug’s target) has more weight in the outcome:
If you think it in reverse terms, in cancer models in which the ERK_f
equation is mostly based on an AND NOT
link operator logic and have the ERK_F
node as inactive (also from the equation DUSP6 *= ( ( mTORC1_c ) or ERK_f )
, DUSP6
will also be inactive), the MEK_f
node will also be mostly inactive (\(2\) out of \(3\) cases in the table above: 1st, 2nd and 4th row) - it can be seen as a node that does not play an important role in the activity outcome of ERK_f - and so by inhibiting it further with the PD
drug will not bring any significant benefit for cancer treatment.
Thus, in cancer boolean models where the ERK_f
node is overexpressed and the MEK_f
logical node is it’s most crucial regulator, inhibiting both the MAPK/ERK pathway (drug PD
) and the AKT pathway (drug AK
) is a synergistic combination for cancer treatment.
It will be interesting to find all the possible synergy sets and subsets that include the AK-PD
as the extra synergy. Using these synergy sets, models that predict a set of synergies will be contrasted to models that predicted the same set with the addition of the extra AK-PD
synergy. Thus we could find synergy biomarkers that allow already good predicting models to predict the additional synergy of interest. This investigation will allow us thus to refine the activity state and link operator biomarkers we found above.
We first construct two matrices: in the first, each row is a set vs subset average activity difference vector of the network nodes, while on the second each row is a set vs subset average link operator difference vector of the network nodes:
model.predictions = model.predictions.per.cell.line$A498
models.stable.state = models.stable.state.per.cell.line$A498
models.link.operator = models.link.operators.per.cell.line$A498
res = get_synergy_comparison_sets(cell.specific.synergy.analysis.res$A498$synergy.subset.stats)
AK.PD.res = res %>% filter(synergies == "AK-PD")
diff.state.list = list()
diff.link.list = list()
for (i in 1:nrow(AK.PD.res)) {
synergy.set = AK.PD.res[i, 2]
synergy.subset = AK.PD.res[i, 3]
synergy.set.str = unlist(strsplit(x = synergy.set, split = ","))
synergy.subset.str = unlist(strsplit(x = synergy.subset, split = ","))
# count models
synergy.set.models.num = count_models_that_predict_synergies(synergy.set.str, model.predictions)
synergy.subset.models.num = count_models_that_predict_synergies(synergy.subset.str, model.predictions)
# if too small number of models, skip the diff vector
if ((synergy.set.models.num <= 3) | (synergy.set.models.num <= 3))
next
# get the diff
diff.state.ak.pd = get_avg_activity_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, model.predictions, models.stable.state)
diff.link.ak.pd = get_avg_link_operator_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, model.predictions, models.link.operator)
diff.state.list[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.state.ak.pd
diff.link.list[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.link.ak.pd
}
diff.state.mat = do.call(rbind, diff.state.list)
diff.link.mat = do.call(rbind, diff.link.list)
caption.title.1 = "Table 1: Average activity difference values across all synergy set comparisons (AK-PD)"
datatable(data = diff.state.mat[, c("SRC", "CSK", "MEK_f", "STAT1", "PTPN6")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.1, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.state.mat), digits = 3)
caption.title.2 = "Table 2: Average link operator difference values across all synergy set comparisons (AK-PD)"
datatable(data = diff.link.mat[, c("SRC", "RAC_f", "MEK_f", "STAT1", "PTEN")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.2, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.link.mat), digits = 3)
Using the matrix from Table 1 (where we show just \(5\) nodes), we count per network node the number of times that the node’s average activity difference value has surpassed a specified threshold - i.e. the number of times it has been found as important (a biomarker) across all the synergy set comparisons (so the more the better):
threshold = 0.8
biomarker.state.mat = binarize_to_thres(mat = diff.state.mat, threshold)
biomarker.state.counts = colSums(biomarker.state.mat)
pretty_print_vector_names_and_values(table(biomarker.state.counts))
0: 141, 1: 1, 11: 2
The above means that there were \(141\) nodes that were zero times found as activity state biomarkers across all synergy set comparisons, one that was found once and \(2\) that were found \(11\) times (out of a total of 19). These nodes are:
pretty_print_vector_names(biomarker.state.counts[biomarker.state.counts == 11])
2 nodes: SRC, PTPN6
If we visualize the average activity state difference across all synergy comparison sets from Table 1 we can see that the previous \(2\) nodes are more pronounced:
plot_avg_state_diff_graph(net, diff = colMeans(diff.state.mat), layout = nice.layout,
title = "Average activity state diff across all synergy subsets (AK-PD)")
We also visualize the median activity difference per node from Table 1 using a dotchart
, where we have excluded the nodes that have an absolute median activity difference of \(0.05\) or less:
df = as.data.frame(apply(diff.state.mat, 2, median))
colnames(df) = "median.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(median.state.diff > 0.05 | median.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "median.state.diff",
title = "Median activity state difference across all synergy comparison sets (AK-PD, A498 cell line)",
label = "nodes", font.label = list(size = 11, color = "blue"),
label.select = list(criteria = "`y` >= 0.7 | `y` <= -0.7"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Median Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 10) +
font("title", size = 11) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7))
Note the boolean equation: PTPN6 *= SRC
. This means that SRC
is the only inhibited node of interest
We will now check if the above observations regarding the activity of the nodes are true using the MCLP dataset as a reference:
mclp.data = read.table(file = "MCLP-v1.1-Level4.tsv", header = TRUE, stringsAsFactors = FALSE)
cell.lines.in.mclp = c("A498", "AGS", "DU145", "COLO205", "SW620", "SF295", "UACC62", "MDAMB468")
We check the phosphorylation value of the SRC_pY527
across all cell lines:
src.data = mclp.data %>% select(Sample_Name, SRC_pY527) %>% na.omit()
# find the activity classes (3 classes: low activity, no activity, high activity)
res = Ckmeans.1d.dp(x = src.data$SRC, k = 3)
activity = as.factor(res$cluster)
levels(activity) = c("low", "medium", "high")
src.data = cbind(src.data, activity)
ggdotchart(data = src.data, x = "Sample_Name", y = "SRC_pY527",
title = "SRC_pY527 signaling across all cell lines in MCLP Dataset",
color = "activity", palette = c("red", "grey", "green"),
label = "Sample_Name", label.select = cell.lines.in.mclp, repel = TRUE,
add = "segments", label.rectangle = TRUE,
xlab = "Cell Lines", ylab = "SRC_pY527 Signaling",
add.params = list(color = "activity", palette = c("red", "grey", "green"))) +
theme(axis.text.x = element_blank(), axis.ticks = element_blank())
It has been shown in various studies (e.g. in this paper) that the phosphorylation sites of SRC
include the inhibiting phosphotyrosine 527 site, which overrides the Y416
phosphorylation (which is also tested in the MCLP dataset). Since for A498
cell line we observe one of the largest signaling measurements compared to all the cell lines for the SRC_pY527
site, this means that SRC
should be in an inhibiting state, which is exactly what we found from our analysis above.
We also check for the phosphorylation value of the MAPK_pT202Y204
(for the ERK_f
activation) across all cell lines:
erk.data = mclp.data %>% select(Sample_Name, MAPK_pT202Y204) %>% na.omit()
# find the activity classes (3 classes: low expression, no expression, high expression)
res = Ckmeans.1d.dp(x = erk.data$MAPK_pT202Y204, k = 3)
activity = as.factor(res$cluster)
levels(activity) = c("low", "medium", "high")
erk.data = cbind(erk.data, activity)
ggdotchart(data = erk.data, x = "Sample_Name", y = "MAPK_pT202Y204",
title = "MAPK_pT202Y204 signaling across all cell lines in MCLP Dataset",
color = "activity", palette = c("red", "grey", "green"),
label = "Sample_Name", label.select = cell.lines.in.mclp, repel = TRUE, xlab = "Cell Lines",
add = "segments", label.rectangle = TRUE, ylab = "MAPK_pT202Y204 Signaling",
add.params = list(color = "activity", palette = c("red", "grey", "green"))) +
theme(axis.text.x = element_blank(), axis.ticks = element_blank())
One of the activating phosphorylation sites of ERK
is the ERK1 T202/Y204. From the above figure we see a weak signaling of this phoshosite for the A498
cell line which does not fully correlate well with the results from our analysis above (ERK_f
overexpression). Though, in a later section we show that ERK_f
is found active in all models from all cell lines that predict the AK-PD
synergy (for example even in UACC62
cell line which has a high phosphorylation signal in the above figure).
Using the matrix from Table 2, we count per network node the number of times that the node’s average link operator difference value has surpassed a specified threshold - i.e. the number of times it has been found as important (a biomarker) across all the synergy set comparisons (so the more the better):
biomarker.link.mat = binarize_to_thres(mat = diff.link.mat, thres = 0.8)
biomarker.link.counts = colSums(biomarker.link.mat)
pretty_print_vector_names_and_values(table(biomarker.link.counts))
0: 44, 1: 1, 2: 1, 6: 5, 11: 1
The above means that there were \(44\) nodes that were zero times found as link operator biomarkers across all synergy set comparisons, one that was found once, one that was found twice, \(5\) that were found 6 times and \(1\) that was found 11 times. The last two categories of nodes are:
pretty_print_vector_names(biomarker.link.counts[biomarker.link.counts == 6])
5 nodes: TGFBR2, mTORC1_c, TSC_f, CDC42, RHOA
pretty_print_vector_names(biomarker.link.counts[biomarker.link.counts == 11])
1 node: SRC
If we visualize the average link operator difference across all of the synergy comparison sets from Table 2, we can see the nodes mentioned above:
plot_avg_link_operator_diff_graph(net, diff = colMeans(diff.link.mat), layout = nice.layout,
title = "Average link operator diff across all synergy subsets (AK-PD)")
Some notes/observations on the structure of the boolean equations of the nodes found above:
The boolean equation of the SRC
node is: SRC *= ( RTPK_f ) AND/OR NOT ( CSK )
and it’s mean link operator difference value across all synergy set comparisons is -0.5721936 which means that on average the logic operator that binds its two regulators is the AND NOT
and thus it’s more difficult to have it as activated (1 case out of 4 in boolean logic). This correlates with the fact that it was found as mostly inhibited in the analysis above. Also note the equations:
CSK *= ( PRKACA )
PRKACA *= ( ( NFKB_f ) or FOS )
FOS *= ( ( ( ERK_f ) or RSK_f ) or SRF )
So, when ERK_f
is overexpressed, CSK
becomes active, which means that the prevalence of the AND NOT
link operator makes the activity of the SRC
node dependent only on it’s inhibitor CSK
(since it’s active) and not on the activity of RTPK_f
node.
mTORC1_c *= ( ( RHEB ) or RSK_f ) AND/OR NOT ( AKT1S1 )
AKT1S1 *= not ( AKT_f )
mTORC1_c
’s mean link operator difference value across all synergy set comparisons is 0.699427 which means that on average the logic operator that binds its two regulators is the OR NOT
. This gives the node the structural flexibility to become active when the AK
drug is used: AK
inhibits AKT_f
node, making thus AKT1S1
active, which in turn makes the mTORC1_c
equation like this: mTORC1_c *= ( ( RHEB ) or RSK_f ) AND/OR 0
. So, an AND
link operator would result always in an inhibited mTORC1_c
whereas an OR
link operator would give the possibility for the mTORC1_c
to be active in case one of its activators are active.
We conclude that cancer models in which the AK-PD
drug combination is synergistic, tend to also have the nodes SRC
and PTPN6
in an inhibited state and the SRC
node depends on both it’s regulators, RTPK_f
and CSK
. The link operators of the mTORC1_c
and TSC_f
equations also seem to have an important role in the manifestation of this synergy.
We get the average state and link operator differences per network node for the A498
cell line from the random models:
AK.PD.avg.state.diff.random = random.synergy.analysis.res$A498$diff.state.synergies.mat["AK-PD", ]
AK.PD.avg.link.diff.random = random.synergy.analysis.res$A498$diff.link.synergies.mat["AK-PD", ]
We will now visualize the nodes average state differences in a network graph. Note that the good models are those that predicted the AK-PD
drug combination to be synergistic and were contrasted to those that predicted it to be antagonistic (bad models). The number of models in each category were:
drug.comb = "AK-PD"
good.models.num = sum(random.model.predictions[, drug.comb] == 1 & !is.na(random.model.predictions[, drug.comb]))
# unique good models
# nrow(unique(random.models.link.operator[random.model.predictions[, drug.comb] == 1 & !is.na(random.model.predictions[, drug.comb]),]))
bad.models.num = sum(random.model.predictions[, drug.comb] == 0 & !is.na(random.model.predictions[, drug.comb]))
pretty_print_string(paste0("Number of 'good' random models (AK-PD synergistic) in the A498 cell line: ", good.models.num))
Number of ‘good’ random models (AK-PD synergistic) in the A498 cell line: 107
pretty_print_string(paste0("Number of 'bad' random models (AK-PD antagonistic) in the A498 cell line: ", bad.models.num))
Number of ‘bad’ random models (AK-PD antagonistic) in the A498 cell line: 3632
plot_avg_state_diff_graph(net, diff = AK.PD.avg.state.diff.random,
layout = nice.layout, title = "AK-PD activity state biomarkers (Random models - A498)")
Thus, we can identify the active state biomarkers (note that no inhibited biomarkers at the specified threshold difference level were found):
AK.PD.active.biomarkers = AK.PD.avg.state.diff.random[AK.PD.avg.state.diff.random > 0.8]
pretty_print_vector_names(AK.PD.active.biomarkers)
1 node: ERK_f
The random models that predicted the AK-PD
synergy also show an overexpression of the ERK_f
node.
We visualize the nodes average link operator differences in a network graph:
plot_avg_link_operator_diff_graph(net, diff = AK.PD.avg.link.diff.random,
layout = nice.layout, title = "AK-PD link operator biomarkers (Random models - A498)")
The importance of the OR NOT link operator in the boolean equation of ERK_f
is again proven to be crucial for the manifestation of the AK-PD
synergy, along with the link operators of the equations of the MEK_f
, PTEN
and PDPK1
nodes.
We perform the same kind of analysis as with the cell-specific models: models that predict a set of synergies will be contrasted to models that predicted the same set with the addition of the extra AK-PD
synergy, allowing us thus to refine the activity state and link operator biomarkers we found above from the random models.
We first construct two matrices: in the first, each row is a set vs subset average activity difference vector of the network nodes, while on the second each row is a set vs subset average link operator difference vector of the network nodes:
res = get_synergy_comparison_sets(random.synergy.analysis.res$A498$synergy.subset.stats)
AK.PD.res = res %>% filter(synergies == "AK-PD")
diff.state.list.random = list()
diff.link.list.random = list()
for (i in 1:nrow(AK.PD.res)) {
synergy.set = AK.PD.res[i, 2]
synergy.subset = AK.PD.res[i, 3]
synergy.set.str = unlist(strsplit(x = synergy.set, split = ","))
synergy.subset.str = unlist(strsplit(x = synergy.subset, split = ","))
# count models
synergy.set.models.num = count_models_that_predict_synergies(synergy.set.str, random.model.predictions)
synergy.subset.models.num = count_models_that_predict_synergies(synergy.subset.str, random.model.predictions)
# print(paste0(synergy.set.models.num, " ", synergy.subset.models.num))
# if too small number of models, skip the diff vector
if ((synergy.set.models.num <= 3) | (synergy.set.models.num <= 3))
next
# get the diff
diff.state.ak.pd = get_avg_activity_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, random.model.predictions, random.models.stable.state)
diff.link.ak.pd = get_avg_link_operator_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, random.model.predictions, random.models.link.operator)
diff.state.list.random[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.state.ak.pd
diff.link.list.random[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.link.ak.pd
}
diff.state.mat.random = do.call(rbind, diff.state.list.random)
diff.link.mat.random = do.call(rbind, diff.link.list.random)
caption.title.3 = "Table 3: Average activity difference values across all synergy set comparisons (AK-PD)"
datatable(data = diff.state.mat.random[, c("SRC", "CSK", "MEK_f", "STAT1", "PTPN6")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.3, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.state.mat.random), digits = 3)
caption.title.4 = "Table 4: Average link operator difference values across all synergy set comparisons (AK-PD)"
datatable(data = diff.link.mat.random[, c("SRC", "RAC_f", "MEK_f", "STAT1", "PTEN")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.4, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.link.mat.random), digits = 3)
Using the matrix from Table 3 (where we show just \(5\) nodes), we count per network node the number of times that the node’s average activity difference value has surpassed a specified threshold - i.e. the number of times it has been found as important (a biomarker) across all the synergy set comparisons (so the more the better):
biomarker.state.mat.random = binarize_to_thres(mat = diff.state.mat.random, thres = 0.7)
biomarker.state.counts.random = colSums(biomarker.state.mat.random)
pretty_print_vector_names_and_values(table(biomarker.state.counts.random))
0: 139, 1: 1, 2: 1, 4: 1, 6: 2
So, there are \(2\) nodes that were found as activity state biomarkers \(6\) times across all synergy set comparisons. These nodes are:
pretty_print_vector_names(biomarker.state.counts.random[biomarker.state.counts.random == 6])
2 nodes: PDPK1, PRKCD
But the total number of comparisons was 49 so we could argue that this is not a statistically significant result, which can be seen clearly in the next graph where we visualize the average activity state difference across all synergy comparison sets from Table 3:
plot_avg_state_diff_graph(net, diff = colMeans(diff.state.mat.random), layout = nice.layout,
title = "Average activity state diff across all synergy subsets (AK-PD)")
Using the matrix from Table 4, we count per network node the number of times that the node’s average link operator difference value has surpassed a specified threshold - i.e. the number of times it has been found as important (a biomarker) across all the synergy set comparisons (so the more the better):
biomarker.link.mat.random = binarize_to_thres(mat = diff.link.mat.random, thres = 0.7)
biomarker.link.counts.random = colSums(biomarker.link.mat.random)
pretty_print_vector_names_and_values(table(biomarker.link.counts.random))
0: 50, 1: 1, 7: 1
So, there was a node that was found \(7\) times (out of a total of 49, so not statistically significant) as a link operator biomarkers across all synergy set comparisons:
pretty_print_vector_names(biomarker.link.counts.random[biomarker.link.counts.random == 7])
1 node: mTORC1_c
We visualize the average link operator difference across all synergy comparison sets from Table 4:
plot_avg_link_operator_diff_graph(net, diff = colMeans(diff.link.mat.random), layout = nice.layout,
title = "Average link operator diff across all synergy subsets (AK-PD)")
Though the AK-PD
synergy was observed and predicted in the A498 model dataset, we investigate its biomarkers in the other cell lines where it was predicted as a False Positive (FP) synergy (predicted by the models but not observed in the experiments). Thus, we can still see if there are any common (activity state and link operator) biomarkers for AK-PD
across the all the cell-specific models by contrasting in each cell line the models that predicted AK-PD
(if there were any) vs the models that did not:
drug.comb = "AK-PD"
ak.pd.diff.state.list = list()
ak.pd.diff.link.list = list()
for (cell.line in cell.lines) {
if (cell.line == "A498")
next
model.predictions = model.predictions.per.cell.line[[cell.line]]
ak.pd.diff.state.list[[cell.line]] = get_avg_activity_diff_based_on_specific_synergy_prediction(
model.predictions, models.stable.state = models.stable.state.per.cell.line[[cell.line]], drug.comb)
ak.pd.diff.link.list[[cell.line]] = get_avg_link_operator_diff_based_on_specific_synergy_prediction(
model.predictions, models.link.operator = models.link.operators.per.cell.line[[cell.line]], drug.comb)
# See the #models that predicted vs #models those that did not
# print(paste0(sum(model.predictions[,drug.comb], na.rm = T), " vs ", sum(!model.predictions[,drug.comb], na.rm = T)))
}
ak.pd.diff.state.mat = do.call(rbind, ak.pd.diff.state.list)
ak.pd.diff.link.mat = do.call(rbind, ak.pd.diff.link.list)
ERK_f
was the one node that was found as an activity state & link operator AK-PD
biomarker in all cell lines
ak.pd.biomarker.state.mat = binarize_to_thres(mat = ak.pd.diff.state.mat, thres = 0.7)
ak.pd.biomarker.link.mat = binarize_to_thres(mat = ak.pd.diff.link.mat, thres = 0.7)
ak.pd.biomarker.state.mat.counts = colSums(ak.pd.biomarker.state.mat)
ak.pd.biomarker.link.mat.counts = colSums(ak.pd.biomarker.link.mat)
#pretty_print_vector_names_and_values(table(ak.pd.biomarker.state.mat.counts))
#pretty_print_vector_names_and_values(table(ak.pd.biomarker.link.mat.counts))
pretty_print_vector_names(ak.pd.biomarker.state.mat.counts[ak.pd.biomarker.state.mat.counts == 7])
1 node: ERK_f
pretty_print_vector_names(ak.pd.biomarker.link.mat.counts[ak.pd.biomarker.link.mat.counts == 6])
1 node: ERK_f
As observed in the heatmap above, the PD-PI
synergy was predicted by both the cell specific and random models in the A498
cell line.
PD.PI.avg.state.diff.cell.specific = cell.specific.synergy.analysis.res$A498$diff.state.synergies.mat["PD-PI",]
#PD.PI.avg.link.diff.cell.specific = cell.specific.synergy.analysis.res$A498$diff.link.synergies.mat["PD-PI", ]
We will now visualize the nodes average state differences in a network graph. Note that the good models are those that predicted the PD-PI
drug combination to be synergistic and were contrasted to those that predicted it to be antagonistic (bad models). The number of models in each category were:
model.predictions = model.predictions.per.cell.line[["A498"]]
models.stable.state = models.stable.state.per.cell.line[["A498"]]
drug.comb = "PD-PI"
good.models.num = sum(model.predictions[, drug.comb] == 1 & !is.na(model.predictions[, drug.comb]))
# unique good models
# models.link.operator = models.link.operators.per.cell.line$A498
# nrow(unique(models.link.operator[model.predictions[, drug.comb] == 1 & !is.na(model.predictions[, drug.comb]), ]))
bad.models.num = sum(model.predictions[, drug.comb] == 0 & !is.na(model.predictions[, drug.comb]))
pretty_print_string(paste0("Number of 'good' models (PD-PI synergistic) in the A498 cell line: ", good.models.num))
Number of ‘good’ models (PD-PI synergistic) in the A498 cell line: 107
pretty_print_string(paste0("Number of 'bad' models (PD-PI antagonistic) in the A498 cell line: ", bad.models.num))
Number of ‘bad’ models (PD-PI antagonistic) in the A498 cell line: 6873
plot_avg_state_diff_graph(net, diff = PD.PI.avg.state.diff.cell.specific,
layout = nice.layout, title = "PD-PI activity state biomarkers (Cell specific models - A498)")
We now visualize the nodes average state differences in a dotchart
where we can easily identify the active and inhibited state biomarkers with the defined thresholds (we have excluded nodes whose absolute average difference value was less than \(0.05\)):
df = as.data.frame(PD.PI.avg.state.diff.cell.specific)
colnames(df) = "avg.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(avg.state.diff > 0.05 | avg.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "avg.state.diff",
title = "Average activity state difference (Good - bad) - A498 cell line",
label = "nodes", font.label = list(size = 11, color = "blue"),
label.select = list(criteria = "`y` >= 0.7 | `y` <= -0.5"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Average Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 8) +
font("title", size = 14) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7)) +
annotate("text", x = c(18, 35), y = -0.9, label = c("Good models: PD-PI synergy,", "Bad Models: PD-PI antagonism"))
Note the boolean equation: PPM1A *= ( PTEN )
. This means that PTEN
is the only inhibited node of interest
We now check the PTEN
in the MCLP
dataset:
pten.data = mclp.data %>% select(Sample_Name, PTEN) %>% na.omit()
# find the activity classes (3 classes: low activity, no activity, high activity)
res = Ckmeans.1d.dp(x = pten.data$PTEN, k = 3)
activity = as.factor(res$cluster)
levels(activity) = c("low", "medium", "high")
pten.data = cbind(pten.data, activity)
ggdotchart(data = pten.data, x = "Sample_Name", y = "PTEN",
title = "PTEN signaling across all cell lines in MCLP Dataset",
color = "activity", palette = c("red", "grey", "green"),
label = "Sample_Name", label.select = cell.lines.in.mclp, repel = TRUE,
add = "segments", label.rectangle = TRUE,
xlab = "Cell Lines", ylab = "PTEN Signaling",
add.params = list(color = "activity", palette = c("red", "grey", "green"))) +
theme(axis.text.x = element_blank(), axis.ticks = element_blank())
The PTEN
node is found highly expressed in the A498
cell line in the MCLP
dataset, but for the models to predict the PD-PI
synergy we found that the node needs to be in an inhibited state (see dotchart here). This correlates though with the fact that PTEN
is found inhibited in many cancer types (article). The overexpression of the ERK_f
node is also a result from this analysis.
It will be interesting to find all the possible synergy sets and subsets that include the PD-PI
as the extra synergy. Using these synergy sets, models that predict a set of synergies will be contrasted to models that predicted the same set with the addition of the extra PD-PI
synergy. Thus we could find synergy biomarkers that allow already good predicting models to predict the additional synergy of interest. This investigation will allow us thus to refine the activity state biomarkers we found above.
model.predictions = model.predictions.per.cell.line$A498
models.stable.state = models.stable.state.per.cell.line$A498
models.link.operator = models.link.operators.per.cell.line$A498
res = get_synergy_comparison_sets(cell.specific.synergy.analysis.res$A498$synergy.subset.stats)
PD.PI.res = res %>% filter(synergies == "PD-PI")
diff.state.list = list()
for (i in 1:nrow(PD.PI.res)) {
synergy.set = PD.PI.res[i, 2]
synergy.subset = PD.PI.res[i, 3]
synergy.set.str = unlist(strsplit(x = synergy.set, split = ","))
synergy.subset.str = unlist(strsplit(x = synergy.subset, split = ","))
# count models
synergy.set.models.num = count_models_that_predict_synergies(synergy.set.str, model.predictions)
synergy.subset.models.num = count_models_that_predict_synergies(synergy.subset.str, model.predictions)
# if too small number of models, skip the diff vector
# if ((synergy.set.models.num <= 3) | (synergy.set.models.num <= 3))
# next
# get the diff
diff.state.PD.PI = get_avg_activity_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, model.predictions, models.stable.state)
diff.state.list[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.state.PD.PI
}
diff.state.mat = do.call(rbind, diff.state.list)
caption.title.5 = "Table 5: Average activity difference values across all synergy set comparisons (PD-PI)"
datatable(data = diff.state.mat[, c("SRC", "CSK", "MEK_f", "STAT1", "PTPN6")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.5, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.state.mat), digits = 3)
If we visualize the average activity state difference across all synergy comparison sets from Table 5 in a network graph, we see that there are no nodes with significant average state difference:
plot_avg_state_diff_graph(net, diff = colMeans(diff.state.mat), layout = nice.layout,
title = "Average activity state diff across all synergy subsets (PD-PI)")
We also visualize the median activity difference per node from Table 5 using a dotchart
, where we have excluded the nodes that have an absolute median activity difference of \(0.05\) or less:
df = as.data.frame(apply(diff.state.mat, 2, median))
colnames(df) = "median.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(median.state.diff > 0.05 | median.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "median.state.diff",
title = "Median activity state difference across all synergy comparison sets (PD-PI, A498 cell line)",
label = "nodes", font.label = list(size = 11, color = "blue"),
label.select = list(criteria = "`y` >= 0.7 | `y` <= -0.7"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Median Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 10) +
font("title", size = 11) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7))
So no significant nodes whose activity plays important role in the manifestation of the PD-PI
synergy were found with the synergy subset analysis method.
We get the average state differences per network node for the A498
cell line from the random models:
PD.PI.avg.state.diff.random = random.synergy.analysis.res$A498$diff.state.synergies.mat["PD-PI",]
We will now visualize the nodes average state differences in a network graph. Note that the good models are those that predicted the PD-PI
drug combination to be synergistic and were contrasted to those that predicted it to be antagonistic (bad models). The number of models in each category were:
drug.comb = "PD-PI"
good.models.num = sum(random.model.predictions[, drug.comb] == 1 & !is.na(random.model.predictions[, drug.comb]))
# unique good models
# nrow(unique(random.models.link.operator[random.model.predictions[, drug.comb] == 1 & !is.na(random.model.predictions[, drug.comb]),]))
bad.models.num = sum(random.model.predictions[, drug.comb] == 0 & !is.na(random.model.predictions[, drug.comb]))
pretty_print_string(paste0("Number of 'good' random models (PD-PI synergistic) in the A498 cell line: ", good.models.num))
Number of ‘good’ random models (PD-PI synergistic) in the A498 cell line: 719
pretty_print_string(paste0("Number of 'bad' random models (PD-PI antagonistic) in the A498 cell line: ", bad.models.num))
Number of ‘bad’ random models (PD-PI antagonistic) in the A498 cell line: 6066
plot_avg_state_diff_graph(net, diff = PD.PI.avg.state.diff.random,
layout = nice.layout, title = "PD-PI activity state biomarkers (Random models - A498)")
We now visualize the nodes average state differences in a dotchart
where we can easily identify the active and inhibited state biomarkers with the defined thresholds (we have excluded nodes whose absolute average difference value was less than \(0.05\)):
df = as.data.frame(PD.PI.avg.state.diff.random)
colnames(df) = "avg.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(avg.state.diff > 0.05 | avg.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "avg.state.diff",
title = "Average activity state difference for Random Models (Good - bad) - A498 cell line",
label = "nodes", font.label = list(size = 10, color = "blue"),
label.select = list(criteria = "`y` >= 0.7 | `y` <= -0.5"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Average Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 8) +
font("title", size = 14) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7)) +
annotate("text", x = c(8, 17), y = -0.9, label = c("Good models: PD-PI synergy,", "Bad Models: PD-PI antagonism"))
The overexpression of ERK_f is also a characteristic of the random models that predict the PD-PI
drug combination to be synergistic.
We perform the same kind of analysis as with the cell-specific models: models that predict a set of synergies will be contrasted to models that predicted the same set with the addition of the extra PD-PI
synergy, allowing us thus to refine the activity state biomarkers we found above from the random models.
res = get_synergy_comparison_sets(random.synergy.analysis.res$A498$synergy.subset.stats)
PD.PI.res = res %>% filter(synergies == "PD-PI")
diff.state.list.random = list()
for (i in 1:nrow(PD.PI.res)) {
synergy.set = PD.PI.res[i, 2]
synergy.subset = PD.PI.res[i, 3]
synergy.set.str = unlist(strsplit(x = synergy.set, split = ","))
synergy.subset.str = unlist(strsplit(x = synergy.subset, split = ","))
# count models
synergy.set.models.num = count_models_that_predict_synergies(synergy.set.str, random.model.predictions)
synergy.subset.models.num = count_models_that_predict_synergies(synergy.subset.str, random.model.predictions)
# print(paste0(synergy.set.models.num, " ", synergy.subset.models.num))
# if too small number of drug combinations in the subset, skip the diff vector
#if (length(synergy.subset.str) <= 3) next
# if too small number of models compared
#if ((synergy.set.models.num <= 3) | (synergy.set.models.num <= 3)) next
# get the diff
diff.state.PD.PI = get_avg_activity_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, random.model.predictions, random.models.stable.state)
diff.state.list.random[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.state.PD.PI
}
diff.state.mat.random = do.call(rbind, diff.state.list.random)
caption.title.6 = "Table 6: Average link operator difference values across all synergy set comparisons (PD-PI)"
datatable(data = diff.state.mat.random[, c("SRC", "RAC_f", "MEK_f", "STAT1", "PTEN")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.6, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.state.mat.random), digits = 3)
If we visualize the average activity state difference across all synergy comparison sets from Table 6 we see that there are no nodes with significant average state difference:
plot_avg_state_diff_graph(net, diff = colMeans(diff.state.mat.random), layout = nice.layout,
title = "Average activity state diff across all synergy subsets (PD-PI)")
We also visualize the median activity difference per node from Table 6 using a dotchart
, where we have excluded the nodes that have an absolute median activity difference of \(0.05\) or less:
df = as.data.frame(apply(diff.state.mat.random, 2, median))
colnames(df) = "median.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(median.state.diff > 0.05 | median.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "median.state.diff",
title = "Median activity state difference across all synergy comparison sets (PD-PI, Random Models, A498)",
label = "nodes", font.label = list(size = 11, color = "blue"),
label.select = list(criteria = "`y` >= 0.7 | `y` <= -0.7"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Median Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 10) +
font("title", size = 11) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7))
So no significant nodes whose activity plays important role in the manifestation of the PD-PI
synergy were found with the synergy subset analysis method for the random models.
Though the PD-PI
synergy was observed and predicted in the A498 model dataset, we investigate its biomarkers in the other cell lines where it was predicted as a False Positive (FP) synergy (predicted by the models but not observed in the experiments). Thus, we can still see if there are any common (activity state and link operator) biomarkers for this synergy across the all the cell-specific models by contrasting in each cell line the models that predicted PD-PI
(if there were any) vs the models that did not:
drug.comb = "PD-PI"
diff.state.list = list()
for (cell.line in cell.lines) {
if (cell.line == "A498") next
model.predictions = model.predictions.per.cell.line[[cell.line]]
models.stable.state = models.stable.state.per.cell.line[[cell.line]]
models.predicted.num = sum(model.predictions[,drug.comb], na.rm = T)
models.not.predicted.num = sum(!model.predictions[,drug.comb], na.rm = T)
# See the #models that predicted vs #models those that did not
# print(paste0(models.predicted.num, " vs ", models.not.predicted.num))
if (models.predicted.num != 0 & models.not.predicted.num != 0)
diff.state.list[[cell.line]] = get_avg_activity_diff_based_on_specific_synergy_prediction(model.predictions, models.stable.state, drug.comb)
}
# combine the data to a single matrix
diff.state.mat = do.call(rbind, diff.state.list)
Now, we identify the biomarkers of interest as those nodes that have surpassed a user-defined threshold (\(0.6\)) across as many as possible cell lines - counting thus the frequency that this has occured. We show those nodes that surpassed the threshold in at least half of the cell lines:
biomarker.state.mat = binarize_to_thres(mat = diff.state.mat, thres = 0.6)
biomarkers.freq = colSums(biomarker.state.mat)/nrow(biomarker.state.mat)
pretty_print_vector_names_and_values(biomarkers.freq[biomarkers.freq > 0.5])
So we identified two biomarkers,ERK_f: 1, SOS1: 0.714285714285714
ERK_f
(known from before) and SOS1
. The average activity difference values for the SOS1
per cell line model datasets were:
breaks.red = quantile(c(-1,0), probs = seq(.05, .95, .05), na.rm = TRUE)
colors.red = sort(round(seq(255, 40, length.out = length(breaks.red) + 1), digits = 0)) %>%
{paste0("rgb(255,", ., ",", ., ")")} # red
datatable(data = as.data.frame(diff.state.mat[,'SOS1']), colnames = "SOS1",
options = list(dom = 't', order = list(list(1, 'asc'))), width = "470px",
caption = htmltools::tags$caption("Avg activity diff values (all cell lines except A498, PD-PI)", style="color:#dd4814; font-size: 18px")) %>%
formatRound(columns = 1, digits = 3) %>%
formatStyle(columns = 1, backgroundColor = styleInterval(breaks.red, colors.red))
Some article evidence against the above (so SOS1
is somewhat expressed in these cancer cell lines):
As observed in the heatmap above, the PD-G2
synergy was predicted by both the cell specific and random models in the A498
cell line.
PD.G2.avg.state.diff.cell.specific = cell.specific.synergy.analysis.res$A498$diff.state.synergies.mat["PD-G2", ]
We will now visualize the nodes average state differences in a network graph. Note that the good models are those that predicted the PD-G2
drug combination to be synergistic and were contrasted to those that predicted it to be antagonistic (bad models). The number of models in each category were:
model.predictions = model.predictions.per.cell.line[["A498"]]
models.stable.state = models.stable.state.per.cell.line[["A498"]]
drug.comb = "PD-G2"
good.models.num = sum(model.predictions[, drug.comb] == 1 & !is.na(model.predictions[, drug.comb]))
# unique good models
#models.link.operator = models.link.operators.per.cell.line$A498
#nrow(unique(models.link.operator[model.predictions[, drug.comb] == 1 & !is.na(model.predictions[, drug.comb]), ]))
bad.models.num = sum(model.predictions[, drug.comb] == 0 & !is.na(model.predictions[, drug.comb]))
pretty_print_string(paste0("Number of 'good' models (PD-G2 synergistic) in the A498 cell line: ", good.models.num))
Number of ‘good’ models (PD-G2 synergistic) in the A498 cell line: 238
pretty_print_string(paste0("Number of 'bad' models (PD-G2 antagonistic) in the A498 cell line: ", bad.models.num))
Number of ‘bad’ models (PD-G2 antagonistic) in the A498 cell line: 6812
plot_avg_state_diff_graph(net, diff = PD.G2.avg.state.diff.cell.specific,
layout = nice.layout, title = "PD-G2 activity state biomarkers (Cell specific models - A498)")
We now visualize the nodes average state differences in a dotchart
where we can easily identify the active and inhibited state biomarkers with the defined thresholds (we have excluded nodes whose absolute average difference value was less than \(0.05\)):
df = as.data.frame(PD.G2.avg.state.diff.cell.specific)
colnames(df) = "avg.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(avg.state.diff > 0.05 | avg.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "avg.state.diff",
title = "Average activity state difference (Good - bad) - A498 cell line",
label = "nodes", font.label = list(size = 11, color = "blue"),
label.select = list(criteria = "`y` >= 0.9 | `y` <= -0.5"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Average Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 8) +
font("title", size = 14) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7)) +
annotate("text", x = c(18, 32), y = -0.9, label = c("Good models: PD-G2 synergy,", "Bad Models: PD-G2 antagonism"))
Note again the boolean equation: PPM1A *= ( PTEN )
. This means that PTEN
is the only inhibited node of interest
As in the case of the PD-PI
synergy, inhibition of the PTEN
node and overexpression of the ERK_f
node are the main biomarkers (see dotchart above) that make the models predict the PD-G2
synergy.
It will be interesting to find all the possible synergy sets and subsets that include the PD-G2
as the extra synergy. Using these synergy sets, models that predict a set of synergies will be contrasted to models that predicted the same set with the addition of the extra PD-G2
synergy. Thus we could find synergy biomarkers that allow already good predicting models to predict the additional synergy of interest. This investigation will allow us thus to refine the activity state biomarkers we found above.
model.predictions = model.predictions.per.cell.line$A498
models.stable.state = models.stable.state.per.cell.line$A498
models.link.operator = models.link.operators.per.cell.line$A498
res = get_synergy_comparison_sets(cell.specific.synergy.analysis.res$A498$synergy.subset.stats)
PD.G2.res = res %>% filter(synergies == "PD-G2")
diff.state.list = list()
for (i in 1:nrow(PD.G2.res)) {
synergy.set = PD.G2.res[i, 2]
synergy.subset = PD.G2.res[i, 3]
synergy.set.str = unlist(strsplit(x = synergy.set, split = ","))
synergy.subset.str = unlist(strsplit(x = synergy.subset, split = ","))
# count models
synergy.set.models.num = count_models_that_predict_synergies(synergy.set.str, model.predictions)
synergy.subset.models.num = count_models_that_predict_synergies(synergy.subset.str, model.predictions)
# print(paste0(synergy.set.models.num, " ", synergy.subset.models.num))
# if too small number of models, skip the diff vector
if ((synergy.set.models.num <= 3) | (synergy.set.models.num <= 3))
next
# get the diff
diff.state.PD.G2 = get_avg_activity_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, model.predictions, models.stable.state)
diff.state.list[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.state.PD.G2
}
diff.state.mat = do.call(rbind, diff.state.list)
caption.title.7 = "Table 7: Average activity difference values across all synergy set comparisons (PD-G2)"
datatable(data = diff.state.mat[, c("SRC", "CSK", "PTEN", "STAT1", "PTPN6")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.7, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.state.mat), digits = 3)
If we visualize the average activity state difference across all synergy comparison sets from Table 7 in a network graph, we see that there are no nodes with significant average state difference:
plot_avg_state_diff_graph(net, diff = colMeans(diff.state.mat), layout = nice.layout,
title = "Average activity state diff across all synergy subsets (PD-G2)")
We also visualize the median activity difference per node from Table 7 using a dotchart
, where we have excluded the nodes that have an absolute median activity difference of \(0.05\) or less:
df = as.data.frame(apply(diff.state.mat, 2, median))
colnames(df) = "median.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(median.state.diff > 0.05 | median.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "median.state.diff",
title = "Median activity state difference across all synergy comparison sets (PD-G2, A498 cell line)",
label = "nodes", font.label = list(size = 11, color = "blue"),
label.select = list(criteria = "`y` >= 0.5 | `y` <= -0.5"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Median Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 10) +
font("title", size = 11) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7))
The above results show the same thing we found also before with the simple model comparison method: ERK_f
overexpression, PTEN
inhibition.
We get the average state differences per network node for the A498
cell line from the random models:
PD.G2.avg.state.diff.random = random.synergy.analysis.res$A498$diff.state.synergies.mat["PD-G2",]
We will now visualize the nodes average state differences in a network graph. Note that the good models are those that predicted the PD-G2
drug combination to be synergistic and were contrasted to those that predicted it to be antagonistic (bad models). The number of models in each category were:
drug.comb = "PD-G2"
good.models.num = sum(random.model.predictions[, drug.comb] == 1 & !is.na(random.model.predictions[, drug.comb]))
# unique good models
# nrow(unique(random.models.link.operator[random.model.predictions[, drug.comb] == 1 & !is.na(random.model.predictions[, drug.comb]),]))
bad.models.num = sum(random.model.predictions[, drug.comb] == 0 & !is.na(random.model.predictions[, drug.comb]))
pretty_print_string(paste0("Number of 'good' random models (PD-G2 synergistic) in the A498 cell line: ", good.models.num))
Number of ‘good’ random models (PD-G2 synergistic) in the A498 cell line: 1272
pretty_print_string(paste0("Number of 'bad' random models (PD-G2 antagonistic) in the A498 cell line: ", bad.models.num))
Number of ‘bad’ random models (PD-G2 antagonistic) in the A498 cell line: 5756
plot_avg_state_diff_graph(net, diff = PD.G2.avg.state.diff.random,
layout = nice.layout, title = "PD-G2 activity state biomarkers (Random models - A498)")
The overexpression of ERK_f is also a characteristic of the random models that predict the PD-G2
drug combination to be synergistic.
We perform the same kind of analysis as with the cell-specific models: models that predict a set of synergies will be contrasted to models that predicted the same set with the addition of the extra PD-G2
synergy, allowing us thus to refine the activity state biomarkers we found above from the random models.
res = get_synergy_comparison_sets(random.synergy.analysis.res$A498$synergy.subset.stats)
PD.G2.res = res %>% filter(synergies == "PD-G2")
diff.state.list.random = list()
for (i in 1:nrow(PD.G2.res)) {
synergy.set = PD.G2.res[i, 2]
synergy.subset = PD.G2.res[i, 3]
synergy.set.str = unlist(strsplit(x = synergy.set, split = ","))
synergy.subset.str = unlist(strsplit(x = synergy.subset, split = ","))
# count models
synergy.set.models.num = count_models_that_predict_synergies(synergy.set.str, random.model.predictions)
synergy.subset.models.num = count_models_that_predict_synergies(synergy.subset.str, random.model.predictions)
# print(paste0(synergy.set.models.num, " ", synergy.subset.models.num))
# if too small number of drug combinations in the subset, skip the diff vector
#if (length(synergy.subset.str) <= 3) next
# if too small number of models compared
if ((synergy.set.models.num <= 3) | (synergy.set.models.num <= 3)) next
# get the diff
diff.state.PD.G2 = get_avg_activity_diff_based_on_synergy_set_cmp(synergy.set.str, synergy.subset.str, random.model.predictions, random.models.stable.state)
diff.state.list.random[[paste0(synergy.set, " vs ", synergy.subset)]] = diff.state.PD.G2
}
diff.state.mat.random = do.call(rbind, diff.state.list.random)
caption.title.8 = "Table 8: Average link operator difference values across all synergy set comparisons (PD-G2)"
datatable(data = diff.state.mat.random[, c("SRC", "RAC_f", "MEK_f", "STAT1", "PTEN")], options = list(
searching = FALSE, pageLength = 5, lengthMenu = c(5, 10)),
caption = htmltools::tags$caption(caption.title.8, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:ncol(diff.state.mat.random), digits = 3)
If we visualize the average activity state difference across all synergy comparison sets from Table 8 we have some nodes that show significantly different activity in the random models that predict the extra PD-G2
synergy:
plot_avg_state_diff_graph(net, diff = colMeans(diff.state.mat.random), layout = nice.layout,
title = "Average activity state diff across all synergy subsets (PD-G2)")
We also visualize the median activity difference per node from Table 8 using a dotchart
, where we have excluded the nodes that have an absolute median activity difference of \(0.05\) or less:
df = as.data.frame(apply(diff.state.mat.random, 2, median))
colnames(df) = "median.state.diff"
df = df %>%
rownames_to_column(var = "nodes") %>%
filter(median.state.diff > 0.05 | median.state.diff < -0.05)
ggdotchart(df, x = "nodes", y = "median.state.diff",
title = "Median activity state difference across all synergy comparison sets (PD-G2, Random Models, A498)",
label = "nodes", font.label = list(size = 11, color = "blue"),
label.select = list(criteria = "`y` >= 0.7 | `y` <= -0.7"),
repel = TRUE, label.rectangle = TRUE,
xlab = "Nodes", ylab = "Median Activity State",
ylim = c(-1,1), add = "segments") +
font("x.text", size = 10) +
font("title", size = 11) +
geom_hline(yintercept = -0.7, linetype = "dashed", color = "red") +
geom_hline(aes(yintercept = 0.7, linetype = "0.7"), color = "red") +
scale_linetype_manual(name = "Threshold", values = 2,
guide = guide_legend(override.aes = list(color = "red"))) +
theme(legend.position = c(0.2,0.7))
From the above figure we see that the nodes PTEN
, PPM1A
and LIMK2
are more found to be more inhibited in the models that predicted the PD-G2
synergy and the nodes PRKCD
and PDPK1
more activated.
From the boolean equation: LIMK2 *= ROCK1 and/or not PRKCD
, since PRKCD == TRUE
(overexpressed) we reduced it to: LIMK2 *= ROCK1 and/or FALSE
where 3 out of 4 cases here correspond to LIMK2 == FALSE
(inhibited). So, we could argue that most of the times the LIMK2
inhibition is due to the PRKCD
overexpression in the end!
Note the equation: PRKCD *= PDPK1 or CASP3
. This means that the activation of PRKCD
is a direct consequence from PDPK1
’s activation. But PDPK1
is also the target of the G2
drug. This means that models that responded to the PD-G2
combination are the ones that showed PDPK1
overexpression!
Also again from the equation: PPM1A *= PTEN
, we have that PTEN
is the only inhibited node of interest
So, according to this analysis, PTEN
inhibition and PDPK1
overexpression are the main biomarkers that cause the random models to respond synergistically to the PD-G2
drug combination.
We will again use the MCLP-dataset to check for the activity/signaling values of PDPK1
and PIK3CA
(the later because it’s strongly connected with the others based on the following boolean equation in our model: PDPK1 *= PIK3CA and/or not PTEN
).
We check the phosphorylation value of the PDK1_pS241
across all cell lines:
PDPK1.data = mclp.data %>% select(Sample_Name, PDK1_pS241) %>% na.omit()
# find the activity classes (3 classes: low activity, no activity, high activity)
res = Ckmeans.1d.dp(x = PDPK1.data$PDK1_pS241, k = 3)
activity = as.factor(res$cluster)
levels(activity) = c("low", "medium", "high")
PDPK1.data = cbind(PDPK1.data, activity)
ggdotchart(data = PDPK1.data, x = "Sample_Name", y = "PDK1_pS241",
title = "PDK1_pS241 signaling across all cell lines in MCLP Dataset",
color = "activity", palette = c("red", "grey", "green"),
label = "Sample_Name", label.select = cell.lines.in.mclp, repel = TRUE,
add = "segments", label.rectangle = TRUE,
xlab = "Cell Lines", ylab = "PDK1_pS241 Signaling",
add.params = list(color = "activity", palette = c("red", "grey", "green"))) +
theme(axis.text.x = element_blank(), axis.ticks = element_blank())
Since for A498
cell line we observe one of the largest signaling measurements compared to all the cell lines for the PDK1_pS241
site, this means that PDPK1
is overexpressed, which is exactly what we found from our analysis above.
Also, we check the phosphorylation value of the PI3K-p110-alpha
across all cell lines:
PIK3CA.data = mclp.data %>% select(Sample_Name, PI3KP110ALPHA) %>% na.omit()
# find the activity classes (3 classes: low activity, no activity, high activity)
res = Ckmeans.1d.dp(x = PIK3CA.data$PI3KP110ALPHA, k = 3)
activity = as.factor(res$cluster)
levels(activity) = c("low", "medium", "high")
PIK3CA.data = cbind(PIK3CA.data, activity)
ggdotchart(data = PIK3CA.data, x = "Sample_Name", y = "PI3KP110ALPHA",
title = "PI3K-p110-alpha signaling across all cell lines in MCLP Dataset",
color = "activity", palette = c("red", "grey", "green"),
label = "Sample_Name", label.select = cell.lines.in.mclp, repel = TRUE,
add = "segments", label.rectangle = TRUE,
xlab = "Cell Lines", ylab = "PI3K-p110-alpha Signaling",
add.params = list(color = "activity", palette = c("red", "grey", "green"))) +
theme(axis.text.x = element_blank(), axis.ticks = element_blank())
So, the results above hint that PIK3CA
might be inhibited in the A498
cell line, which considering the PTEN
inhibition, PDPK1
overexpression and their boolean equation (which connects all of them together): PDPK1 *= PIK3CA and/or not PTEN
(PDPK1 = TRUE
and PIK3CA = PTEN = FALSE
), we conclude that there must be an or not
link operator between the 2 regulators.
PTEN
inhibition and PDPK1
overexpression play both an important role in the manifestation of the PD-G2
synergy as was found with the synergy subset analysis method for the random models. Also, PIK3CA
has an or not
(link operator) relationship with PTEN
(according to our model’s equation) as regulators of PDPK1
.
Though the PD-G2
synergy was observed and predicted in the A498 model dataset, we investigate its biomarkers in the other cell lines where it was predicted as a False Positive (FP) synergy (predicted by the models but not observed in the experiments). Thus, we can still see if there are any common (activity state and link operator) biomarkers for this synergy across the all the cell-specific models by contrasting in each cell line the models that predicted PD-G2
(if there were any) vs the models that did not:
drug.comb = "PD-G2"
diff.state.list = list()
for (cell.line in cell.lines) {
if (cell.line == "A498") next
model.predictions = model.predictions.per.cell.line[[cell.line]]
models.stable.state = models.stable.state.per.cell.line[[cell.line]]
models.predicted.num = sum(model.predictions[,drug.comb], na.rm = T)
models.not.predicted.num = sum(!model.predictions[,drug.comb], na.rm = T)
# See the #models that predicted vs #models those that did not
# print(paste0(models.predicted.num, " vs ", models.not.predicted.num))
if (models.predicted.num != 0 & models.not.predicted.num != 0)
diff.state.list[[cell.line]] = get_avg_activity_diff_based_on_specific_synergy_prediction(model.predictions, models.stable.state, drug.comb)
}
# combine the data to a single matrix
diff.state.mat = do.call(rbind, diff.state.list)
Now, we identify the biomarkers of interest as those nodes that have surpassed a user-defined threshold (\(0.6\)) across as many as possible cell lines - counting thus the frequency that this has occured. We show those nodes that surpassed the threshold in at least half of the cell lines:
biomarker.state.mat = binarize_to_thres(mat = diff.state.mat, thres = 0.6)
biomarkers.freq = colSums(biomarker.state.mat)/nrow(biomarker.state.mat)
pretty_print_vector_names_and_values(biomarkers.freq[biomarkers.freq > 0.5])
So we identified two biomarkers,ERK_f: 1, SOS1: 0.571428571428571
ERK_f
(known from before) and SOS1
. The average activity difference values for the SOS1
per cell line model datasets were:
datatable(data = as.data.frame(diff.state.mat[,'SOS1']), colnames = "SOS1",
options = list(dom = 't', order = list(list(1, 'asc'))), width = "470px",
caption = htmltools::tags$caption("Avg activity diff values (all cell lines except A498, PD-G2)", style="color:#dd4814; font-size: 18px")) %>%
formatRound(columns = 1, digits = 3) %>%
formatStyle(columns = 1, backgroundColor = styleInterval(breaks.red, colors.red))
I found some articles that I believe show that SOS1
is expressed in some of the above cancer cell lines (in contrast to the results above).
As observed in the heatmap above, the PI-D1
synergy was predicted by both the cell specific and random models in some of the cell lines tested. So, for each seperate model dataset, we are going to contrast the models that predicted the PI-D1
synergy vs the models that did not (if these two sets of models do exist for each cell line) and find the average activity difference per node in each case:
drug.comb = "PI-D1"
diff.state.list = list()
# cell-specific data
for (cell.line in cell.lines) {
model.predictions = model.predictions.per.cell.line[[cell.line]]
models.stable.state = models.stable.state.per.cell.line[[cell.line]]
models.predicted.num = sum(model.predictions[,drug.comb], na.rm = T)
models.not.predicted.num = sum(!model.predictions[,drug.comb], na.rm = T)
# See the #models that predicted vs #models those that did not
# print(paste0(models.predicted.num, " vs ", models.not.predicted.num))
if (models.predicted.num != 0 & models.not.predicted.num != 0)
diff.state.list[[cell.line]] = get_avg_activity_diff_based_on_specific_synergy_prediction(model.predictions, models.stable.state, drug.comb)
}
# random model data
#random.models.predicted.num = sum(random.model.predictions[,drug.comb], na.rm = T)
#random.models.not.predicted.num = sum(!random.model.predictions[,drug.comb], na.rm = T)
diff.state.list[["random"]] = get_avg_activity_diff_based_on_specific_synergy_prediction(random.model.predictions, random.models.stable.state, drug.comb)
# combine the data to a single matrix
diff.state.mat = do.call(rbind, diff.state.list)
Now, we identify the biomarkers of interest as those nodes that have surpassed a user-defined threshold (\(0.6\)) across as many as possible cell lines - counting thus the frequency that this has occured. We show those nodes that surpassed the threshold in at least half of the cell lines:
biomarker.state.mat = binarize_to_thres(mat = diff.state.mat, thres = 0.6)
biomarkers.freq = colSums(biomarker.state.mat)/nrow(biomarker.state.mat)
pretty_print_vector_names_and_values(biomarkers.freq[biomarkers.freq > 0.5])
ERK_f: 1, ILK: 0.666666666666667, CDC42: 0.666666666666667, PAK1: 0.666666666666667
So, ERK_f
was a biomarker for all the cell lines that predicted the PI-D1
synergy and the nodes ILK
, CDC42
and PAK1
in more than half of them. If we see the actual numbers we can clarify their state as well:
breaks.green = quantile(c(0,1), probs = seq(.05, .95, .05), na.rm = TRUE)
colors.green = round(seq(255, 40, length.out = length(breaks.red) + 1), digits = 0) %>%
{paste0("rgb(", ., ",255,", ., ")")} # green
caption.title.9 = "Table 9: Average activity state difference values across all model datasets that predicted PI-D1"
datatable(data = diff.state.mat[,c("ILK","PAK1","CDC42","ERK_f","RAC_f","ARHGAP24","SRC")], options = list(dom = 't'),
caption = htmltools::tags$caption(caption.title.9, style="color:#dd4814; font-size: 18px")) %>%
formatRound(1:11, digits = 3) %>%
formatStyle(columns = c(1:3), backgroundColor = styleInterval(breaks.red, colors.red)) %>%
formatStyle(columns = 4, backgroundColor = styleInterval(breaks.green, colors.green))
If we examine the boolean equations: ILK *= PAK1
, PAK1 *= CDC42 or RAC_f
and CDC42 *= SRC and/or not ARHGAP24
as well as the information from the table 9, then we can clearly see that the only inhibited node of interest is CDC42
among the 3 most frequent inhibited biomarkers.
The overexpression of ERK_f
and the inhibition of CDC42
are the two most common biomarkers that characterize the models that are able to predict the PI-D1
observed synergy.
xfun::session_info()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Locale:
LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8 LC_NAME=C
LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
Package version:
assertthat_0.2.1 backports_1.1.5 base64enc_0.1-3
BH_1.72.0.3 bibtex_0.4.2.2 circlize_0.4.8
Ckmeans.1d.dp_4.3.0 cli_2.0.1 clue_0.3-57
cluster_2.1.0 colorspace_1.4-1 compiler_3.6.2
ComplexHeatmap_2.0.0 cowplot_1.0.0 crayon_1.3.4
crosstalk_1.0.0 digest_0.6.23 dplyr_0.8.3
DT_0.11 ellipsis_0.3.0 emba_0.1.2
evaluate_0.14 fansi_0.4.1 farver_2.0.3
fastmap_1.0.1 gbRd_0.4-11 GetoptLong_0.1.8
ggplot2_3.2.1 ggpubr_0.2.4 ggrepel_0.8.1
ggsci_2.9 ggsignif_0.6.0 GlobalOptions_0.1.1
glue_1.3.1 graphics_3.6.2 grDevices_3.6.2
grid_3.6.2 gridExtra_2.3 gtable_0.3.0
highr_0.8 htmltools_0.4.0 htmlwidgets_1.5.1
httpuv_1.5.2 igraph_1.2.4.2 jsonlite_1.6
knitr_1.27 labeling_0.3 later_1.0.0
lattice_0.20.38 lazyeval_0.2.2 lifecycle_0.1.0
magrittr_1.5 markdown_1.1 MASS_7.3.51.5
Matrix_1.2.18 methods_3.6.2 mgcv_1.8.31
mime_0.8 munsell_0.5.0 nlme_3.1.143
parallel_3.6.2 pillar_1.4.3 pkgconfig_2.0.3
plogr_0.2.0 plyr_1.8.5 png_0.1-7
polynom_1.4.0 promises_1.1.0 purrr_0.3.3
R6_2.4.1 RColorBrewer_1.1-2 Rcpp_1.0.3
Rdpack_0.11-1 reshape2_1.4.3 rje_1.10.13
rjson_0.2.20 rlang_0.4.4 rmarkdown_2.1
scales_1.1.0 shape_1.4.4 shiny_1.4.0
sourcetools_0.1.7 splines_3.6.2 stats_3.6.2
stringi_1.4.5 stringr_1.4.0 tibble_2.1.3
tidyr_1.0.2 tidyselect_1.0.0 tinytex_0.19
tools_3.6.2 usefun_0.4.3 utf8_1.1.4
utils_3.6.2 vctrs_0.2.2 viridisLite_0.3.0
visNetwork_2.0.9 withr_2.1.2 xfun_0.12
xtable_1.8-4 yaml_2.2.0 zeallot_0.1.0