Description

The model interactions (taken from a network file, see inputs section) are first assembled to logical boolean equations, based on a default equation relating a node with its regulators:

Target *= (A or B) and not (C or D)

where the activating regulators A, B and the inhibitory regulators C, D of a target node are combined with logical or operators between them and connected with the and not link operator. A target node can have multiple regulators, activating, inhibitory or both. Using boolean algebra the state of the target node can be calculated depending on the states of the regulators (\(0\) means inhibited, \(1\) active).

Gitsbe uses a genetic algorithm to generate and parameterize boolean models that fit to the training data observations. First, an initial generation of models is formulated from the input model, where a large number of mutations to the parameterization is introduced: for example, randomly selected equations are mutated to use the or not link operator instead of the and not. Then for each model, a fitness score is computed as the weighted average over all fitness values for each observation that is specified in the training data file. During this step, the calculation of the models attractors takes place. The models that achieve the highest fitness scores will be selected for the next generation (see respective configuration options) and will be used during the crossover phase to exchange logical equations between them (also including themselves - enabling asexual reproduction!). This is how the models of the new generation are determined. After that, the mutation phase is repeated as described above, followed by the calculation of the attractors and the subsequent selection phase.

After a non-negative total fitness score is obtained for the worst of the best models in the current generation, the number of mutations introduced per generation is reduced by a user-specified factor (see genetic algorithm options). The whole evolution process is halted either when a user-specified fitness threshold is reached or when the (also user-defined) maximum number of generations had been spanned. The highest-fitness models of the last generation are stored in a models directory (see Outputs). Different simulations of the evolution process can be run from the initial model (can be done in parallel utilizing all available cores), using a different seed per simulation and thus creating different parameterized output models in each case.