Training Data

The training data file includes specific condition-response pairs (observations) which are used to calculate the fitness of the boolean models during the evolution process of the genetic algorithm. For each observation, a different fitness score is calculated. Every fitness score is between \(0\) (no fitness at all) and \(1\) (perfect fitness). The format of each observation is:

Condition
<data>
Response
<data>
Weight:<number>

The weight numbers (can be continuous values) are used after each individual observation fitness score has been calculated, to derive a total weighted average fitness score for the model which is fitted to the training data.

We now present the currently supported observations:

Unperturbed Condition - Steady State Response

Example:

Condition
-
Response
A:0 B:1 C:0 D:0.453
Weight:1

This is the most commonly used training option. Note that the response values are tab-separated and that the numbers assinged to each of the entities, define an activity value in the \([0,1]\) interval (continuous values are allowed). The entities have to be nodes from the initial network, otherwise they are ignored. Thus, a boolean model with it’s attractors calculated, gets a fitness score for this kind of observation that describes how close it’s attractors are to the specified steady state response.

For example, if a boolean model has only 1 trapspace attractor, on which the nodes {A,B,C,D} have values {0,1,-,-}, the fitness would be: \[fitness=\frac{\sum matches}{\#responses}=\frac{1+1+(1-abs[0-0.5])+(1-abs[0.453-0.5])}{4}=\frac{3.453}{4} \simeq 0.86\]

If a model has multiple attractors, then first we find the average number of matches across all attractors and then divide by the number of responses. For example, if the previous model had one more attractor for which the nodes perfectly matched the observed responses (so \(4\) in total matches) we would have an average value of matches across the two attractors equal to \((4+3.453)/2=3.7265\) and a fitness thus equal to \(3.7265/4\simeq0.93\) (which makes sense since the second attractor matched better the observed state and thus boosted the fitness).

Unperturbed Condition - globaloutput Response

Example:

Condition
-
Response
globaloutput:1
Weight:1

This training option pretty much translates to: if I leave the system unperturbed, it continues proliferating - a direct description of a cancer cell network system. So, with this type of observation we can train models to match a growing cell/proliferation profile.

Note that the Response must always be in the globaloutput:<number> format and that the absolute observed globaloutput response can take any value in the \([0,1]\) interval (from a cell death state to a cell proliferation state so to speak). We mostly define it as an \(1\) in this kind of observation.

In order to find the fitness of a boolean model to this kind of observation, we first calculate it’s attractors, compute it’s normalized predicted globaloutput \(gl_{pred}\) using Equation No. (2) and then calculate: \[fitness=1-abs(gl_{obs}-gl_{pred})\]

where \(gl_{obs}\) is the value defined in the Response (usually \(1\) in this case).

Knockout/Overexpression Condition

Example:

Condition
A:0 B:1
Response
globaloutput:0
Weight:0.1

The above example translates to: knockout of A and overexpression of B entities (e.g. proteins/genes) combined, result in complete cell death (these observations are usually based on some experimental data). So with this kind of observation, we train our model’s output behaviour to best fit an experimental tested knockout or overexpression of one or many biological entities.

The Response must always be in the globaloutput:<number> format, with <number> a continuous value in the \([0,1]\) interval (ranging from cell death to a cell proliferation state).

The Condition must have tab-separated nodes with activity values (one or many). The activity values must be either \(0\) or \(1\), otherwise they are ignored. This is because we use logical modeling and substitute the equations of the boolean model as A *= false and B *= true respectively, before we calculate its attractors (both A and B must be in the defined network model). After the attractors of the modified model are calculated, we compute it’s normalized predicted globaloutput \(gl_{pred}\) using Equation No. (2) and then calculate: \[fitness=1-abs(gl_{obs}-gl_{pred})\]

where \(gl_{obs}\) is the value defined in the Response (usually \(0\) in this case).

Single Drug Perturbation

Example:

Condition
Drug(A)
Response
globaloutput:0
Weight:0.1

With this observation we define how a single drug perturbation affects our model’s output state (based on experimental data).

The Response must always be in the globaloutput:<number> format, with <number> a continuous value in the \([0,1]\) interval (ranging from cell death to a cell proliferation state).

The Condition must be in the Drug(<DrugName>) format, where the <DrugName> is one of the drugs defined in the drug panel. Thus we can find the drug’s (defined) targets and perturb our model accordingly: if for example the PI drug inhibits entity A (the target) we change our model’s respective equation to A *= false. Same logic if the drug had activating targets - the respective equations change to Target *= true (note that this is scarcely used since most drugs inhibit their targets).

Once the model is modified and it’s attractors calculated, we compute it’s normalized predicted globaloutput \(gl_{pred}\) using Equation No. (2) and then calculate: \[fitness=1-abs(gl_{obs}-gl_{pred})\]

where \(gl_{obs}\) is the value defined in the Response (usually \(0\) in this case).

Double Drug Perturbation

Example:

Condition
Drug(A+B) < min(Drug(A),Drug(B))
Response
globaloutput:-0.2
Weight:1

In this particular case, we can train our model to best fit a synergistic observation between two drugs.

To derive that two drugs are synergistic, the experimental data are analyzed with various mathematical and computational models which compare the actual observed response with the predicted (by the model) non-interaction response. If the measured response is lower than the expected non-interaction one, a synergy is defined and the excess - the relative globaloutput (\(gl_{rel}\)) between the actual response and the predicted non-interaction response - is used as input in the Response (format: globaloutput:<number>).

A negative value for the relative globaloutput \(gl_{rel}\)/<number> defines a synergistic response while a positive value an antogonistic one (so we can also train the model for antagonism between the two drugs). Given that the observed and non-interaction responses are in the \([0,1]\) interval (ranging from cell death to a cell proliferation state), their difference (the relative globaloutput) must belong in the \([-1,1]\) interval (ranging from a highly synergistic to a highly antagonistic relationship between the 2 drugs).

For the calculation of the model’s fitness we can use either an HSA (Highest Single Agent) Condition (as in the example above) where the format would be Drug(A+B) < min(Drug(A),Drug(B)) (A and B are drugs defined in the drug panel) or a Bliss Condition (Bliss 1939), with the Drug(A+B) < product(Drug(A),Drug(B)) format. In each case, we compute the attractors of 3 models: one perturbed with drug A alone, one perturbed with drug B alone and one perturbed with (the targets of) both drugs. Then, using the attractors of each model and Equation No. (2), we compute each respective normalized predicted globaloutput as: \(gl_A,gl_B,gl_{A+B}\).

Then:

  • In the case of an HSA Condition, we compute the minimum globaloutput \(gl_{min}=min(gl_A,gl_B)\) and then the HSA excess as: \(excess=gl_{A+B}-gl_{min} \in [-1,1]\).
  • In the case of a Bliss Condition, we compute the globaloutput product \(gl_{product}=gl_A\times gl_B\) and then the Bliss excess as: \(excess=gl_{A+B}-gl_{product} \in [-1,1]\)

Next, in order to find how close that excess is to the one given in the training observation (the relative globaloutput \(gl_{rel}\)), we calculate their absolute difference and normalize it to get a value in the \([0,1]\) interval with which we can find the fitness: \[fitness=1-\frac{abs(excess-gl_{rel})}{2}\]