Configuration

📜 Default config structure

{
  "global": {
    "output_dir": "output",
    "verbose": false,
    "save": true,
    "base_data_dir": "data"
  },
  "paths": {
    "drug_names_file": "drug_names.txt",
    "synergy_data_file": "synergy_data.csv",
    "node_dict_file": "node_dict.csv",
    "tissue_cline_file": "tissue_cline.csv",
    "db_file": null,
    "manual_chembl_csv": "manual_chembl.csv"
  },
  "columns": {
    "drug_name": "drug_name",
    "drug_name_A": "drug_name_A",
    "drug_name_B": "drug_name_B",
    "conc_A": "conc_A",
    "conc_B": "conc_B",
    "cell_line": "cell_line",
    "synergy": "synergy"
  },
  "options": {
    "synergy_threshold": 0.0,
    "double_drug_screen": true,
    "original_target_merge": "fill_missing"
  }
}

💡 Key points

  • db_file: null → Uses internal package database (managed by project, not user).

  • base_data_dir — Base path; all relative paths are resolved against it.

  • columns — Customise column names if your data uses different headers.

  • Deep-merge override: Custom config merges recursively with defaults; only specified sections override.

📐 Data format specifications

Drug names file

Plain text, one drug per line:

Aspirin
Ibuprofen
Paracetamol

Node dictionary (CSV)

Gene/protein symbols mapped to logical model node names:

gene,node
EGFR,EGFR_node
TP53,p53
BRAF,BRAF_node

Synergy data (CSV)

With concentration data (dual-drug screening):

drug_name_A,drug_name_B,conc_A,conc_B,tissue,cell_line,synergy
Aspirin,Ibuprofen,1.0,2.0,Breast,MCF7,0.15

Without concentration data (single-dose combinations):

drug_name_A,drug_name_B,tissue,cell_line,synergy
Aspirin,Ibuprofen,Breast,MCF7,0.15

Tissue-cell line mapping (CSV)

tissue,cell_line
Breast,MCF7
Breast,T47D
Colorectal,HCT116

🔧 Advanced features

Manual ChEMBL mapping

Provide a CSV to override ChEMBL resolution (skip network queries):

drug_name,ChEMBL_ID
Aspirin,CHEMBL25
Ibuprofen,CHEMBL521

Configure in paths.manual_chembl_csv.

Verbose logging & timing

drexpa --config my_config.json --verbose

Output includes:

  • Step start/end timestamps

  • Per-step duration (seconds)

  • Preflight warnings & validation details

  • Pipeline summary

❓ Troubleshooting

Missing required files

Error: FileNotFoundError: Preflight validation failed. Missing required files Fix: Check file paths in config. Ensure base_data_dir is correct.

Missing required columns

Error: ValueError: Missing required columns in synergy data Fix: Verify columns section in config matches your data headers. Use --verbose to see exact missing columns.

ChEMBL resolution fails

Error: Network timeout or no results for drug name Fix:

  • Check drug name spelling (must match ChEMBL exactly or be unambiguous).

  • Provide manual ChEMBL mapping in manual_chembl.csv to skip network queries.

  • Run with --verbose to see which drugs failed.