Statistics
==========

This page provides detailed explanations for two key statistical methods in PyDrugLogics: `sampling_with_ci` and `compare_two_simulations`.


.. _sampling_with_ci:

Sampling with Confidence Intervals
----------------------------------

The `sampling_with_ci` function calculates confidence intervals for the Precision-Recall curve of Boolean models
using bootstrap resampling. It also generates sampling results, PR Curve with Confidence Interval and staitstical summary about the results.


Process
~~~~~~~
1. **Sampling**: Boolean models are sampled randomly based on the specified ratio.
2. **Bootstrap Resampling**: Precision-Recall Curves are generated for bootstrap samples.
3. **Confidence Interval Calculation**:

   - Confidence intervals are estimated for precision values at different recall points.
   - Summary statistics include mean, standard deviation, and margin of error.


Arguments
~~~~~~~~~
- **boolean_models**: *(List)* List of BooleanModel instances to evaluate.
- **observed_synergy_scores**: *(List[str])* List of experimentally observed synergy scores.
- **model_outputs**: *(ModelOutputs)* Model outputs used for evaluation.
- **perturbations**: *(Perturbations)* List of perturbations applied to the models.
- **synergy_method**: *(str, default='bliss')* Method for evaluating synergy (`'hsa'` or `'bliss'`).
- **repeat_time**: *(int, default=10)* Number of times to repeat the sampling process.
- **sub_ratio**: *(float, default=0.8)* Proportion of models to sample in each iteration.
- **boot_n**: *(int, default=1000)* Number of bootstrap resampling iterations for confidence interval estimation.
- **confidence_level**: *(float, default=0.9)* Confidence level for confidence interval calculations.
- **plot**: *(bool, default=True)* Whether to display the PR curve.
- **plot_discrete**: *(bool, default=False)* Whether to plot discrete points on the PR curve.
- **save_result**: *(bool, default=True)* Whether to save the sampling results.
- **with_seeds**: *(bool, default=True)* Whether to use a fixed seed for reproducibility.
- **seeds**: *(int, default=42)* Seed value for random number generation.


Initialization
~~~~~~~~~~~~
.. code-block:: python

    def sampling_with_ci(boolean_models: List, observed_synergy_scores: List[str],
                         model_outputs: Any, perturbations: Any,
                         synergy_method: str = 'bliss', repeat_time: int = 10,
                         sub_ratio: float = 0.8, boot_n: int = 1000,
                         confidence_level: float = 0.9, plot: bool = True,
                         plot_discrete: bool = False, save_result: bool = True,
                         with_seeds: bool = True, seeds: int = 42) -> None:


Example Results
~~~~~~~~~~~~~~~
**Precision-Recall Curve with Confidence Intervals**

.. image:: /images/pr_with_ci.png
   :alt: Precision-Recall Curve with Confidence Intervals
   :align: center
   :width: 75%

**Saved File Example**

1. Synergy Score Sample 1

.. code-block:: text

   # Date: 2024/11/19, Time: 10:49
    # Synergies (bliss)
    perturbation_name	synergy_score
    PI-PD	-0.1580120888736708
    PI-CT	0.002975517890772106
    PI-BI	0.01302189111677321
    PI-PK	-0.004479748223085389
    PI-AK	0.07575392975581308


2. Sampling wit CI statistical summary

.. code-block:: text

    Date: 2024/11/19, Time: 10:49
    Sampling Results
    Point Estimate (Mean): 0.483440
    Standard Deviation: 0.262091
    Standard Error: 0.014934
    Confidence Interval: (0.454170, 0.512711)
    Confidence Level: 95.0%
    Critical Value: 1.959964
    Margin of Error: 0.029270
    Sample Size: 308

Output
~~~~~~
- **Precision-Recall Curve with Confidence Intervals**: A plot showing the PR curve and confidence intervals.
- **AUC-PR**: The Area Under the Curve for the PR curve.
- **Sampling Results**: Tabulated results saved to a directory, including synergy scores and confidence intervals.
- **Statistical summary**: Provides a summary that contains information about the result.  Data: Point Estimate (Mean), Standard Deviation, Standard Error, Confidence Interval, Confidence Level, Critical Value, Margin of Error, Sample Size.

.. _compare_two_simulations:

Compare Two Simulations
------------------------

The `compare_two_simulations` function compares the results of two run of train and predict. It also supports normalization of the first provided Boolean Model.

Process
~~~~~~~
1. **Run Simulations**: Predict synergy scores for each set of Boolean models.
2. **Normalization**: Optionally normalize the first set of results using calibrated synergy scores.
3. **Curve Generation**: Generate ROC and PR curves for each set of predictions.
4. **Save Results**: Synergy scores are saved to the specified directory.


Arguments
~~~~~~~~~
- **boolean_models1**: *(List)* List of the best Boolean Models from the first simulation set.
- **boolean_models2**: *(List)* List of the best Boolean Models from the second simulation set.
- **observed_synergy_scores**: *(List[str])* List of experimentally observed synergy scores for comparison.
- **model_outputs**: *(ModelOutputs)* Model outputs used for evaluation.
- **perturbations**: *(Perturbations)* List of perturbations applied to the models.
- **synergy_method**: *(str, default='bliss')* Method for evaluating synergy (`'hsa'` or `'bliss'`).
- **label1**: *(str, default='Models 1')* Label for the first set of results.
- **label2**: *(str, default='Models 2')* Label for the second set of results.
- **normalized**: *(bool, default=True)* Whether to normalize the first set of results.
- **plot**: *(bool, default=True)* Whether to display ROC and PR curves.
- **save_result**: *(bool, default=True)* Whether to save the comparison results.


Initialization
~~~~~~~~~~~~
.. code-block:: python

    def compare_two_simulations(boolean_models1: List, boolean_models2: List,
                                observed_synergy_scores: List[str], model_outputs: Any,
                                perturbations: Any, synergy_method: str = 'bliss',
                                label1: str = 'Models 1', label2: str = 'Models 2',
                                normalized: bool = True, plot: bool = True,
                                save_result: bool = True) -> None:

Example Results
~~~~~~~~~~~~~~~
**Picture: Comparison of ROC and PR Curves**

.. image:: /images/comparison_roc_pr.png
   :alt: Comparison of ROC and PR Curves
   :align: center
   :width: 100%

**Saved File Examples**

.. code-block:: text

   # Date: 2024/11/19, Time: 10:49
    # Synergies (bliss)
    perturbation_name	synergy_score
    PI-PD	-0.1580120888736708
    PI-CT	0.002975517890772106
    PI-BI	0.01302189111677321
    PI-PK	-0.004479748223085389
    PI-AK	0.07575392975581308