Configuration
The configuration file includes options that are common between Gitsbe and Drabme (see General and Attractor Tool options), those that are Gitsbe-specific (see Export and Genetic Algorithm options) and those that are Drabme-specific.
The format of each configuration option in the file must be: <parameter>: <value>
(tab-separated)
General
verbosity
. Allowed values: \(0\)-\(3\) (\(0\) = nothing, \(3\) = everything).This option is used for logging purposes since both Gitsbe and Drabme create a
log
directory where various logging messages are written in files.delete_tmp_files
. Logical (true
orfalse
).Gitsbe and Drabme create
<name>_tmp
directories (one each,<name>
is eithergitsbe
ordrabme
) which are used to store the logical model files that are created throughout the simulations. If this option istrue
, aFileDeleter
object is enabled that monitors the temporary directories and deletes the logical model files after they are used (e.g. when their attractors are calculated). After the simulations are finished, the<name>_tmp
directories get deleted as well.compress_log_and_tmp_files
. Logical (true
orfalse
).Use this option to archive the files inside the
log
directory as well as the<name>_tmp
directories. The output format is.tar.gz
. This option is usually used when theverbosity
is \(3\) and the number of simulations (simulations
parameter) is high (e.g. \(>100\)).use_parallel_sim
. Logical (true
orfalse
).States whether the simulations will run in parallel, utilizing thus all the machine’s cores.
parallel_sim_num
. Allowed values: \(>1\).The number of simulations to execute in parallel if the previous option (
use_parallel_sim
) istrue
. A good value for this option would be to have as many parallel simulations as the machine’s cores but we advise to reduce it if too many parallel simulations are causing issues.
Attractor Tool
attractor_tool
: tool to use for the calculation of attractors.Allowed values:
bnet_reduction
,bnet_reduction_reduced
,biolqm_stable_states
,biolqm_trapspaces
,mpbn_trapspaces
.The first two options use the BNReduction tool (Veliz-Cuba et al. 2014), next two the BioLQM Java library (Naldi 2018) and the last one the Most Permissive Boolean Networks framework (Paulevé et al. 2020). Follow the respective documentation to install and enable the two BNReduction-based versions and the MPBN
Python
script (BioLQM is included by default).The
bnet_reduction
andbiolqm_stable_states
options calculate all the fixpoints of the boolean models. Thebnet_reduction_reduced
works only if the model has one fixpoint attractor (or none). Note though that there can be models that have one fixpoint and the reduced BNReduction version is not able to find it. It’s advantage rests on the simple fact that it’s much faster for larger networks and when self-contained network models are used, it gets most of the results correctly (self-contained models usually don’t have many fixpoints).The
biolqm_trapspaces
andmpbn_trapspaces
options calculate the terminal trapspaces (see respective BioLQM documentation and MPBN documentation). These kind of trapspaces are also called minimal (Klarner, Bockmayr, and Siebert 2015). Thempbn_trapspaces
option is a lot faster in general, but for smaller networks (\(<100\) nodes) it’s best to use BioLQM’s implementation since there is an I/O call overhead when using the MPBN Python script.
Export
Options for trimming the initial network file:
remove_output_nodes
. Logical (true
orfalse
).Removes nodes recursively from the model that have no outgoing edges.
remove_input_nodes
. Logical (true
orfalse
).Removes nodes recursively from the model that have no incoming edges.
Options for exporting the initial network file to different formats:
export_to_gitsbe
. Logical (true
orfalse
)An example of a simple network in
gitsbe
format:modelname: test_model
fitness: 0.82
stablestate: 110
equation: A *= B or C
equation: B *= A
equation: C *= !A
mapping: A = x1
mapping: B = x2
mapping: C = x3The
gitsbe
format includes the following information:- Model’s name
- Model’s fitness score (gained via fitting to the training data)
- Model’s attractors, if they are calculated (
stablestate
ortrapspace
- one per line) - The boolean equations, in BooleanNet format (Albert et al. 2008)
- A mapping between node names and variables (mainly used with the BNReduction tool)
export_to_sif
. Logical (true
orfalse
)The Cytoscape’s single-interaction format. Example of a topology with three activating interactions and an inhibiting one:
B -> A
C -> A
A -> B
A -| Cexport_to_ginml
. Logical (true
orfalse
)GINsim’s XML-based GINML format. This export is enabled via the BioLQM library (Naldi 2018).
export_to_sbml_qual
. Logical (true
orfalse
)SBML-qual is an extension of the Systems Biology Markup Language (SBML) Level 3 standard, designed for the representation of multivalued qualitative models of biological networks. This export is enabled via the BioLQM library (Naldi 2018).
export_to_boolnet
. Logical (true
orfalse
)The R package’s BoolNet format (Müssel, Hopfensitz, and Kestler 2010). This export is also enabled via the BioLQM library (Naldi 2018). Example:
targets, factors
A, B|C
B, A
C, !A
We also provide export options for the best models generated via the genetic algorithm of Gitsbe for each simulation.
Note that these models are automatically saved in gitsbe
format (by default) inside the models
directory for input to Drabme.
The export formats are the same as the three last ones described above:
best_models_export_to_ginml
. Logical (true
orfalse
)best_models_export_to_sbml_qual
. Logical (true
orfalse
)best_models_export_to_boolnet
. Logical (true
orfalse
)
Genetic Algorithm
The following options are used to initialize, configure and calibrate the genetic algorithm of Gitsbe:
simulations
. Allowed values: \(\ge 1\).Number of simulations (evolutions) to run. Each simulation is based on a different seed so it’s quaranteed to be different than the others and also reproducible. The seed determines the random choices that are taken throughout the evolution process of fitting the models to the training data.
models_saved
. Allowed values: \(\ge 1\).Number of models to save per simulation. These models are saved in a
models
directory that Drabme can use and are the highest fitness models of the last generation.fitness_threshold
\(\in [0,1]\)Fitness threshold for saving models per simulation: if a best model does not have a fitness score larger than the
fitness_threshold
value, it is not saved.generations
. Allowed values: \(\ge 1\).Number of generations per simulation. Note that the actual number of generations might be less if the
target_fitness
value is surpassed.target_fitness
: \(\in (0,1]\)Target fitness threshold to stop evolution. If any of the best models in a generation achieves fitness higher than the
target_fitness
value, the corresponding simulation is stopped.population
. Allowed values: \(\ge 1\).Number of models per generation.
selection
. Allowed values: \(\ge 1\).Number of models selected for the next generation (during selection phase). These are the models that had the best fitness scores among all the models of their generation.
crossovers
. Allowed values: \(\ge 1\).At the end of each generation, the best models are selected (higher fitness) and they exchange equations between them to determine the models of the new generation (crossover phase). The number of crossovers defines how many splitting points we are going to have in the equations so as to split them between two parent best models that give birth to a child model. For example, if
crossovers = 1
, we randomly select one splitting point and all the equations up to that point (index programmatically) are copied from the first parent while the rest of the equations are copied from the other parent model. Thus the child model is a mix of equations between the two parent models and the higher the number of crossovers are, the more complex that mix becomes. If the number of crossovers is larger than the number of equations, then we take one equation alternatively from each parent and give it to the child model.balance_mutations
random_mutations
shuffle_mutations
topology_mutations
Allowed values: \(\ge 0\). After the models of each generation are created, the mutation phase takes place, during which a number of possible mutations are introduced to the models. The 4 different kind of mutations that can be applied to randomly chosen default equation of a logical model are:
- Balance: and not <=> or not. Also called link operator mutations.
- Random: (A or B) <=> (A and B). A change of boolean operator which denotes a different relationship between the entities (e.g. family nodes vs complex nodes).
- Shuffle: (A or (B and C)) <=> (B or (A and C)). Also called priority mutations.
- Topology: (A or B) <=> (B). Involves the addition and/or removal of regulation edges.
These mutations can be applied at the initial stage and after it. A simulation starts at the initial stage and comes out of it when the worst of the best models in a generation has a non-zero fitness score (usually this means that there is a model with an attractor).
The difference before and after the initial stage is how many of these mutations are going to be applied.
For that purpose, we can configure multiplication factors that are used to boost the above mutations during the initial phase (bootstrap_*_factor
options) and lessen them after it is over (*_factor
options).
Note that all values are \(\ge 0\).
bootstrap_mutations_factor
bootstrap_shuffle_factor
bootstrap_topology_mutations_factor
mutations_factor
shuffle_factor
topology_mutations_factor
We usually use a large number of mutations in the initial stage (high bootstrap_*_factor
, e.g. \(1000\)) to ensure that a large variation in parameterization can be explored.
Then, a value of \(1\) for the *_factor
options will ensure that we just apply the number of mutations as they were specified in the *_mutations
options.
Note that the bootstrap_mutations_factor
and mutations_factor
are used to multiply both the random and balance mutation options.