Configuration

The configuration file includes options that are common between Gitsbe and Drabme (see General and Attractor Tool options), those that are Gitsbe-specific (see Export and Genetic Algorithm options) and those that are Drabme-specific.

The format of each configuration option in the file must be: <parameter>: <value> (tab-separated)

General

  • verbosity. Allowed values: \(0\)-\(3\) (\(0\) = nothing, \(3\) = everything).

    This option is used for logging purposes since both Gitsbe and Drabme create a log directory where various logging messages are written in files.

  • delete_tmp_files. Logical (true or false).

    Gitsbe and Drabme create <name>_tmp directories (one each, <name> is either gitsbe or drabme) which are used to store the logical model files that are created throughout the simulations. If this option is true, a FileDeleter object is enabled that monitors the temporary directories and deletes the logical model files after they are used (e.g. when their attractors are calculated). After the simulations are finished, the <name>_tmp directories get deleted as well.

  • compress_log_and_tmp_files. Logical (true or false).

    Use this option to archive the files inside the log directory as well as the <name>_tmp directories. The output format is .tar.gz. This option is usually used when the verbosity is \(3\) and the number of simulations (simulations parameter) is high (e.g. \(>100\)).

  • use_parallel_sim. Logical (true or false).

    States whether the simulations will run in parallel, utilizing thus all the machine’s cores.

  • parallel_sim_num. Allowed values: \(>1\).

    The number of simulations to execute in parallel if the previous option (use_parallel_sim) is true. A good value for this option would be to have as many parallel simulations as the machine’s cores but we advise to reduce it if too many parallel simulations are causing issues.

Attractor Tool

  • attractor_tool: tool to use for the calculation of attractors.

    Allowed values: bnet_reduction, bnet_reduction_reduced, biolqm_stable_states, biolqm_trapspaces, mpbn_trapspaces.

    The first two options use the BNReduction tool (Veliz-Cuba et al. 2014), next two the BioLQM Java library (Naldi 2018) and the last one the Most Permissive Boolean Networks framework (Paulevé et al. 2020). Follow the respective documentation to install and enable the two BNReduction-based versions and the MPBN Python script (BioLQM is included by default).

    The bnet_reduction and biolqm_stable_states options calculate all the fixpoints of the boolean models. The bnet_reduction_reduced works only if the model has one fixpoint attractor (or none). Note though that there can be models that have one fixpoint and the reduced BNReduction version is not able to find it. It’s advantage rests on the simple fact that it’s much faster for larger networks and when self-contained network models are used, it gets most of the results correctly (self-contained models usually don’t have many fixpoints).

    The biolqm_trapspaces and mpbn_trapspaces options calculate the terminal trapspaces (see respective BioLQM documentation and MPBN documentation). These kind of trapspaces are also called minimal (Klarner, Bockmayr, and Siebert 2015). The mpbn_trapspaces option is a lot faster in general, but for smaller networks (\(<100\) nodes) it’s best to use BioLQM’s implementation since there is an I/O call overhead when using the MPBN Python script.

Export

Options for trimming the initial network file:

  • remove_output_nodes. Logical (true or false).

    Removes nodes recursively from the model that have no outgoing edges.

  • remove_input_nodes. Logical (true or false).

    Removes nodes recursively from the model that have no incoming edges.

Options for exporting the initial network file to different formats:

  • export_to_gitsbe. Logical (true or false)

    An example of a simple network in gitsbe format:

    modelname: test_model
    fitness: 0.82
    stablestate: 110
    equation: A *= B or C
    equation: B *= A
    equation: C *= !A
    mapping: A = x1
    mapping: B = x2
    mapping: C = x3

    The gitsbe format includes the following information:

    • Model’s name
    • Model’s fitness score (gained via fitting to the training data)
    • Model’s attractors, if they are calculated (stablestate or trapspace - one per line)
    • The boolean equations, in BooleanNet format (Albert et al. 2008)
    • A mapping between node names and variables (mainly used with the BNReduction tool)
  • export_to_sif. Logical (true or false)

    The Cytoscape’s single-interaction format. Example of a topology with three activating interactions and an inhibiting one:

    B -> A
    C -> A
    A -> B
    A -| C

  • export_to_ginml. Logical (true or false)

    GINsim’s XML-based GINML format. This export is enabled via the BioLQM library (Naldi 2018).

  • export_to_sbml_qual. Logical (true or false)

    SBML-qual is an extension of the Systems Biology Markup Language (SBML) Level 3 standard, designed for the representation of multivalued qualitative models of biological networks. This export is enabled via the BioLQM library (Naldi 2018).

  • export_to_boolnet. Logical (true or false)

    The R package’s BoolNet format (Müssel, Hopfensitz, and Kestler 2010). This export is also enabled via the BioLQM library (Naldi 2018). Example:

    targets, factors
    A, B|C
    B, A
    C, !A

We also provide export options for the best models generated via the genetic algorithm of Gitsbe for each simulation. Note that these models are automatically saved in gitsbe format (by default) inside the models directory for input to Drabme. The export formats are the same as the three last ones described above:

  • best_models_export_to_ginml. Logical (true or false)
  • best_models_export_to_sbml_qual. Logical (true or false)
  • best_models_export_to_boolnet. Logical (true or false)

Genetic Algorithm

The following options are used to initialize, configure and calibrate the genetic algorithm of Gitsbe:

  • simulations. Allowed values: \(\ge 1\).

    Number of simulations (evolutions) to run. Each simulation is based on a different seed so it’s quaranteed to be different than the others and also reproducible. The seed determines the random choices that are taken throughout the evolution process of fitting the models to the training data.

  • models_saved. Allowed values: \(\ge 1\).

    Number of models to save per simulation. These models are saved in a models directory that Drabme can use and are the highest fitness models of the last generation.

  • fitness_threshold\(\in [0,1]\)

    Fitness threshold for saving models per simulation: if a best model does not have a fitness score larger than the fitness_threshold value, it is not saved.

  • generations. Allowed values: \(\ge 1\).

    Number of generations per simulation. Note that the actual number of generations might be less if the target_fitness value is surpassed.

  • target_fitness: \(\in (0,1]\)

    Target fitness threshold to stop evolution. If any of the best models in a generation achieves fitness higher than the target_fitness value, the corresponding simulation is stopped.

  • population. Allowed values: \(\ge 1\).

    Number of models per generation.

  • selection. Allowed values: \(\ge 1\).

    Number of models selected for the next generation (during selection phase). These are the models that had the best fitness scores among all the models of their generation.

  • crossovers. Allowed values: \(\ge 1\).

    At the end of each generation, the best models are selected (higher fitness) and they exchange equations between them to determine the models of the new generation (crossover phase). The number of crossovers defines how many splitting points we are going to have in the equations so as to split them between two parent best models that give birth to a child model. For example, if crossovers = 1, we randomly select one splitting point and all the equations up to that point (index programmatically) are copied from the first parent while the rest of the equations are copied from the other parent model. Thus the child model is a mix of equations between the two parent models and the higher the number of crossovers are, the more complex that mix becomes. If the number of crossovers is larger than the number of equations, then we take one equation alternatively from each parent and give it to the child model.

  • balance_mutations

  • random_mutations

  • shuffle_mutations

  • topology_mutations

    Allowed values: \(\ge 0\). After the models of each generation are created, the mutation phase takes place, during which a number of possible mutations are introduced to the models. The 4 different kind of mutations that can be applied to randomly chosen default equation of a logical model are:

  • Balance: and not <=> or not. Also called link operator mutations.
  • Random: (A or B) <=> (A and B). A change of boolean operator which denotes a different relationship between the entities (e.g. family nodes vs complex nodes).
  • Shuffle: (A or (B and C)) <=> (B or (A and C)). Also called priority mutations.
  • Topology: (A or B) <=> (B). Involves the addition and/or removal of regulation edges.

These mutations can be applied at the initial stage and after it. A simulation starts at the initial stage and comes out of it when the worst of the best models in a generation has a non-zero fitness score (usually this means that there is a model with an attractor).

The difference before and after the initial stage is how many of these mutations are going to be applied. For that purpose, we can configure multiplication factors that are used to boost the above mutations during the initial phase (bootstrap_*_factor options) and lessen them after it is over (*_factor options). Note that all values are \(\ge 0\).

  • bootstrap_mutations_factor
  • bootstrap_shuffle_factor
  • bootstrap_topology_mutations_factor
  • mutations_factor
  • shuffle_factor
  • topology_mutations_factor

We usually use a large number of mutations in the initial stage (high bootstrap_*_factor, e.g. \(1000\)) to ensure that a large variation in parameterization can be explored. Then, a value of \(1\) for the *_factor options will ensure that we just apply the number of mutations as they were specified in the *_mutations options.

Note that the bootstrap_mutations_factor and mutations_factor are used to multiply both the random and balance mutation options.