Reproduce Data & Simulation Results

ROC and PR curves, Fitness Evolution2

You can of course change several other parameters in the input files or the script itself (e.g. number of simulations to run - see here for a complete list of configuration options). To get the results for the topology mutations for CASCADE 2.0 you need to change the ags_cascade_2.0/config file option topology_mutations: 10 and balance_mutations: 0 (the default options are \(0\) topology mutations and \(3\) link-operator/balance mutations). If you wish to get the results using both kinds of mutation, set both topology_mutations and balance_mutations options to a non-zero value (\(10\) and \(3\) were used in the simulations).

So, for example to get the simulation output directories for the Cascade 1.0 Analysis I just run the run_druglogics_synergy.sh script with the following options defined in the loops inside (no need to change any further configuration):

  • cascade_version: 1.0 (which topology to use)
  • train: ss rand (train to the AGS steady state or to a (random) proliferation phenotype))
  • sim_num: 50 (number of simulations)
  • attr_tool: fixpoints (attractor tool, common across all report)
  • synergy_method: hsa bliss (synergy calculation method used by drabme)

Each subsequent druglogics-synergy execution results in an output directory and the files of interest (which are used to produce the ROC and PR curves in this report and the AUC sensitivity figures) are the modelwise_synergies.tab and the ensemble_synergies.tab respectively. For the fitness evolution figures we used the summary.txt file of the corresponding simulations.

Specifically, the results described above are stored in the compressed Zenodo file sim_res.tar.gz. When uncompressed, the sim_res.tar.gz file outputs 2 separate directories, one per different topology (CASCADE 1.0 and CASCADE 2.0). The directory with the CASCADE 2.0 related results has 3 subsequent directories, corresponding to the different parameterization that was used in the simulations (link mutations, topology mutations or both). Each further directory, specifies on its name the training type, simulation number, attractor tool and synergy assessment method.

Fitness vs Performance Methods

Generate the training data samples

Use the gen_training_data.R script to produce the training data samples. In this script we first choose \(11\) numbers that represent the number of nodes that are going to be flipped in the AGS steady state. These numbers range from \(1\) (flip just one node) to \(24\) (flip all nodes, i.e. create a complete reversed steady state). Then, for each such number, we generate \(20\) new partially correct steady states, each one having the same amount of randomly-chosen flips in the steady state (e.g. \(20\) steady states where randomly-chosen sets of \(3\) nodes have been flipped). Thus, in total, \(205\) training data sample files are produced (\(205 = 9 \times 20 + 1 \times 24 + 1 \times 1\), where from the \(11\) number of flips, the one flip happens for every node (\(24\) different steady states) and flipping all the nodes generates the unique completely reversed steady state).

The training data files are stored in the Zenodo file training-data-files.tar.gz.

Run model ensembles simulations

To generate the calibrated model ensembles and perform the drug response analysis on them we use the script run_druglogics_synergy_training.sh from the druglogics-synergy repository root (version 1.2.1: git checkout v1.2.1). Note that the training-data-files directory must be placed inside the druglogics-synergy root directory before executing the aforementioned script. The end result we get is the simulation results for each of the training data files (a different directory per training data file).

The following changes need to be applied to the CASCADE 1.0 or 2.0 configuration file (depends on the topology you are using, the files are either druglogics-synergy/ags_cascade_1.0/config or druglogics-synergy/ags_cascade_2.0/config) before executing the script (some are done automatically in the script):

  • If topology mutations are used, disable the link-operator mutations (balance_mutations: 0) and use topology_mutations: 10.
  • Change the number of simulations to \(20\) (link-operator mutations) or \(50\) (topology mutations) for CASCADE 2.0 and to \(50\) for CASCADE 1.0 (default value, link-operator mutations).
  • Change to Bliss synergy method (synergy_method: bliss) no matter the mutations used or topology.

The results of the CASCADE 2.0 link-operator mutated model simulations are stored in the Zenodo file fit-vs-performance-results-bliss.tar.gz, whereas for the CASCADE 2.0 topology mutated models, in the fit-vs-performance-results-bliss-topo.tar.gz file. The results of the CASCADE 2.0 link-operator mutated model simulations are stored in the Zenodo file fit-vs-performance-results-bliss-cascade1.tar.gz.

To parse and tidy up the data from the simulations, use the scripts fit_vs_perf_cascade2_lo.R (for the link-operator-based CASCADE 2.0 simulations), fit_vs_perf_cascade2_topo.R (for the topology-mutation-based CASCADE 2.0 simulations) and fit_vs_perf_cascade1_lo.R (for the link-operator-based CASCADE 1.0 simulations).

Also, we used the run_druglogics_synergy.sh script at the root of the druglogics-synergy (script configuration for CASCADE 2.0: {2.0, prolif, 150, fixpoints, bliss} and for CASCADE 1.0: {1.0, prolif, 50, fixpoints, bliss}) repo to get the ensemble results of the random (proliferative) models that we will use to normalize the calibrated model performance. The result of this simulation is also part of the results described above (see section above, also considering the necessary changes applied for the topology mutation-based simulations for CASCADE 2.0) and it’s available inside the file sim_res.tar.gz of the Zenodo dataset (also available in the results directory - see Repo results structure).

Random Model Bootstrap

  • Install the druglogics-synergy module and use the version 1.2.1: git checkout v1.2.1 (or use directly the released v1.2.1 package)
  • Run the the script run_gitsbe_random.sh inside the ags_cascade_2.0 directory of the druglogics-synergy repository. This creates a results directory which includes a models directory, with a total of \(3000\) gitsbe models which we are going to use for the bootstrapping.
  • Place the models directory inside the ags_cascade_2.0 directory.
  • Execute the bootstrap_models_drabme.sh inside the druglogics-synergy/ags_cascade_2.0 directory. Change appropriately the config file to have synergy_method: bliss. The bootstrap configuration consists of \(20\) batches, each one consisting of a sample of \(100\) randomly selected models from the model directory pool.
  • Use the script random_model_boot.R to tidy the data from the simulations.

The results of the simulations are stored in the random_model_bootstrap.tar.gz file of the Zenodo dataset.

Parameterization Bootstrap

  • Install the druglogics-synergy module and use the version 1.2.1: git checkout v1.2.1 (or use directly the released v1.2.1 package)
  • To generate the \(3\) pools of calibrated models (fitting to the AGS steady state) subject to different normalization schemes, run the script run_gitsbe_param.sh inside the ags_cascade_2.0 directory of the druglogics-synergy repository root. This will generate the directories:
    • gitsbe_link_only_cascade_2.0_ss
    • gitsbe_topology_only_cascade_2.0_ss
    • gitsbe_topo_and_link_cascade_2.0_ss, each of which have a models directory (the model pool)
  • Repeat for each different pool (models directory):
    • Place the models directory inside the ags_cascade_2.0 directory of the druglogics-synergy repository root.
    • Use the bootstrap_models_drabme.sh script, while changing the following configuration: batches=25, batch_size=300 and the project variable name (input to eu.druglogics.drabme.Launcher) as one of the three:
      • --project=link_only_cascade_2.0_ss_bliss_batch_${batch}
      • --project=topology_only_cascade_2.0_ss_bliss_batch_${batch}
      • --project=topo_and_link_cascade_2.0_ss_bliss_batch_${batch}
    , depending on the parameterization scheme that was used in the previous step to produce the models pool. Also change appropriately the config file to have synergy_method: bliss.

The results of all these simulations are stored in the parameterization-comp.tar.gz Zenodo file. Use the script get_param_comp_boot_data.R to tidy up the simulation data to a nice table format.

When uncompressed, the parameterization-comp.tar.gz file outputs 3 separate directories, one per parameterization scheme. Each separate directory is structured so as to contain the gitsbe simulation results with the model pool inside (result of the script run_gitsbe_param.sh), a boot_res directory (includes the results of the bootstrap_models_drabme.sh script) and lastly the results of the random proliferative model simulations which can be reproduced following the guidelines above.

ERK investigation

We split the link operator model pool (\(4500\) models, see above) to \(2\) pools, one with a total of \(2764\) models that have ERK_f active and one with a total of \(1736\) models that have it inhibited in the corresponding stable states. The two model pools are the two directories named erk_active_pool and erk_inhibited_pool respectively inside the Zenodo file erk_perf_investigation.tar.gz.

Then:

  • Install the druglogics-synergy module and use the version 1.2.1: git checkout v1.2.1 (or use directly the released v1.2.1 package)
  • Run the script bootstrap_models_drabme_erk_pools.sh inside the ags_cascade_2.0 directory of the druglogics-synergy repository. This will produce the drug combination prediction results for the bootstrapped ensembles of boolean models from each pool. From each pool, we bootstrapped \(35\) ensembles with \(300\) models each and used the bliss drabme synergy method to calculate the prediction results.
  • Run the script erk_perf_tidy_data.R to calculate the ROC and PR AUC of every bootstrapped ensemble, subject to normalization against the random proliferative model predictions.

CASCADE 1.0 Calibrated Models bootstrap

  • Install the druglogics-synergy module and use the version 1.2.1: git checkout v1.2.1 (or use directly the released v1.2.1 package)
  • Generate one large pool of calibrated models (fitting to the AGS steady state) by using the instructions above => use the run_druglogics_synergy.sh script at the root of the druglogics-synergy repo with script config: {1.0, ss, 1000, fixpoints, bliss}
  • Use the bootstrap_models_drabme_cascade1.sh script to run the bootstrapped model simulations.
  • Use the get_syn_res_boot_ss_cascade1.R script to tidy up the bootstrap simulation data.

The results from the bootstrap simulations are stored in the ss_cascade1_model_bootstrap.tar.gz file of the Zenodo dataset.

Repo results structure

We have gathered all the necessary output files from the above simulations (mostly relating to ROC, PR curves and AUC sensitivity figures) to the directory results for ease of use in our report. The results directory has 3 main sub-directories:

  1. link-only: results from the link-operator mutated models only (used in the sections Cascade 1.0 Analysis and CASCADE 2.0 Analysis (Link Operator Mutations))
  2. topology-only: results from the topology-mutated models only (used in the section CASCADE 2.0 Analysis (Topology Mutations))
  3. topo-and-link: results where both mutations applied to the generated boolean models (used in section CASCADE 2.0 Analysis (Topology and Link Operator Mutations))

In addition, there is a data directory that includes the following:

  • observed_synergies_cascade_1.0: the gold-standard synergies for the CASCADE 1.0 topology (Flobak et al. 2015)
  • observed_synergies_cascade_2.0: the gold-standard synergies for the CASCADE 2.0 topology (Flobak et al. 2019)
  • steadystate, steadystate.rds: the AGS training data for the calibrated models (file + compressed data) - see lo_mutated_models_heatmaps.R script.
  • edge_mat.rds, topo_ss_df.rds: heatmap data for the topology-mutation models - see lo_mutated_models_heatmaps.R script.
  • lo_df.rds, lo_ss_df.rds: heatmap data for the link-operator models - see topo_mutated_models_heatmaps.R script.
  • node_pathway_annotations_cascade2.csv, node_path_tbl.rds: node pathway annotation data for CASCADE 2.0 and compressed data table produced via the node_path_annot_cascade2.R script.
  • cosmic_cancer_gene_census_all_29102020.tsv: Cancer Gene Census COSMIC data downloaded from https://cancer.sanger.ac.uk/census (for academic purposes)
  • cosmic_tbl.rds: a compressed file with a tibble object having the CASCADE 2.0 nodes and their respective COSMIC cancer role annotation (see get_cosmic_data_annot.R script).
  • bootstrap_rand_res.rds: a compressed file with a tibble object having the result data in a tidy format for the analysis related to the Bootstrap Random Model AUC section.
  • res_fit_aucs_cascade1.rds: a compressed file with a tibble object having the result data in a tidy format for the analysis related to the Fitness vs Ensemble Performance section (CASCADE 1.0, link operator mutations).
  • res_fit_aucs.rds: a compressed file with a tibble object having the result data in a tidy format for the analysis related to the Fitness vs Ensemble Performance section (CASCADE 2.0, link operator mutations).
  • res_fit_aucs_topo.rds: a compressed file with a tibble object having the result data in a tidy format for the analysis related to the Fitness vs Ensemble Performance section (CASCADE 2.0, topology mutations).
  • res_param_boot_aucs.rds: a compressed file with a tibble object having the result data in a tidy format for the analysis related to the Bootstrap Simulations section.
  • boot_cascade1_res.rds: a compressed file with a tibble object having the result data from executing the script get_syn_res_boot_ss_cascade1.R, related to the scrambled topologies investigation in CASCADE 1.0.
  • scrambled_topo_res_cascade1.rds: a compressed file with a tibble object having the result data from executing the script get_syn_res_scrambled_topo_cascade1.R, related to the scrambled topologies investigation in CASCADE 1.0.
  • scrambled_topo_res_cascade2.rds: a compressed file with a tibble object having the result data from executing the script get_syn_res_scrambled_topo_cascade2.R, related to the scrambled topologies investigation in CASCADE 2.0.
  • res_erl.rds: a compressed file with a tibble object having the result data from executing the script erk_perf_tidy_data.R, related to the ERK analysis with the link operator mutated models in CASCADE 2.0.
  • tumor_vol_data.csv: the data from the xenograft experiments relating to the PI and 5Z inhibitors.

  1. The AUC sensitivity plots across the report are also included↩︎