Inputs
PyDrugLogics requires various inputs to train Boolean Models and predict drug synergies. These inputs define the interaction model, training data, model outputs, perturbations, and observed synergy scores, each playing an essential role in the software pipeline.
To see the full Jupyter Notebook tutorial click here.
Below is a detailed guide to each input type and its structure.
Input Files
PyDrugLogics supports the following input files to load or construct Boolean Models:
Boolean Model from .sif File
The .sif file format represents network interactions using activation and inhibition relationships. Each row defines an interaction between nodes in the network.
Key notations:
->: Activation
-|: Inhibition
Example:
Code Example for Loading Interactions for construction a Boolean Model
Note
A .sif file defines one Boolean Model.
Boolean Model from .bnet File
The .bnet format is used for defining a Boolean network, where nodes represent variables, and their activation expressions define relationships and dependencies among them. Each node’s state is determined by logical expressions. Logical operators are used to specify relationships:
&: Conjunction
|: Disjunction
!: Negation
Example:
Code Example for Loading Model from .bnet
Note
A .bnet file defines one Boolean Model.
Training Data
The training data file contains condition-response pairs, and a weight that are essential for evaluating the performance of Boolean models during the genetic algorithm’s evolutionary process. A fitness score is calculated for the condition(s)-response(s), reflecting how well (0-worst, 1-best) the model aligns with the training data.
Format
Each observation consists of:
Condition: -
Response: Specifies the node activity levels as a tab-separated list.
3. Weight: Once the fitness values have been calculated, this value is used to weight each condition-response pair and calculate the overall weighted average fitness score of the model fitted to the training data
Types of Training Data
Unperturbed Condition - Steady State Response
This training type describes the system’s steady state, where activity values are assigned to nodes in the range [0, 1]. These values represent the observed state of the system and are compared against the model’s attractors to calculate fitness.
Example:
Unperturbed Condition - Global Output Response
This training type specifies the system’s behavior under no perturbation, typically used for studying proliferation in the networks. The response is defined as globaloutput:<value> in the range [0, 1], with fitness calculated based on how close the predicted global output is to the observed value.
Example:
Initialization Options
1. Load Training Data from File
This method allows loading training data directly from a file. The file can be in a format such as training_data.tab or training_data, containing input in a format like this:
# training data
Condition
-
Response
A:0 B:0 C:0
Weight:1
Where the responses are tab-separated.
Example:
2. Direct Initialization
This method initializes the training data using Python data structures. The responses and weights are provided as a list of tuples.
Example:
Model Outputs
The model outputs defines network nodes and their integer weights, determining their contribution to global signaling outputs (e.g., cell proliferation or death).
Format
Each model output contains:
Node name: string value.
Weight (positive for proliferation, negative for death): continuous numeric value.
Initialization Options
1. Load Model Outputs from File
This method allows loading model outputs directly from a file. The file can be in a format such as modeloutputs.tab or modeloutputs, containing input in a format like this:
Where the names and weights are tab-separated.
Example:
2. Direct Initialization
This method initializes the model outputs using Python data structures. Outputs are provided as a dictionary, with keys representing node names and values representing their corresponding output values.
Example:
Perturbations
Perturbations combine drugs applied to the system. The perturbations list contains all drug combinations to be tested. The drug data contains the effect of each drug on the nodes.
Note
Only 1- and 2-drug perturbations are allowed. Perturbations with more than two drugs are not supported.
Initialization From Dictionary
You can define both drug_data and perturbation_data, or just drug_data:
1. Define `drug_data` and `perturbation_data`: Provide a list of drugs, where each drug entry specifies:
Drug data:
Drug name: Unique name of the drug.
Target(s): The node(s) in the network affected by the drug.
- Effect: This specifies how the drug influences the target and can take the following values:
activates: The drug increases the target’s activity.
inhibits: The drug decreases the target’s activity (this is the default if no effect is specified).
Perturbation data:
Perturbations: One or two-drug combinations. The pipeline handles only single and tro-drug combinations.
If both drug_data and perturbation_data are defined, the explicitly provided perturbations will be used.
Example:
Define only `drug_data`:
If no perturbation_data is provided, the pipeline will automatically generate all possible two-drug combinations from the drug_data.
Example:
Note
If perturbation_data is not provided, it will be automatically calculated to include all drug combinations from the drug_data.
The effect field in drug_data is optional. If omitted, the pipeline assumes the effect is inhibits.
Multiple targets can be specified for a single drug by listing them in the Target(s) field, separated by commas.
Valid options for the effect field are: activates and inhibits.
Observed Synergy Scores
Observed synergy scores are ground truth data used to evaluate model predictions. They are typically derived from experimental datasets or literature sources.
Example: