Sampling Reference
API Documentation for the sampling submodule
Corners
Submodule implementing methods for performing Corner Based Sampling
- metworkpy.sampling.corners.corner_sampling(model: Model, n_samples: int = 1000, reaction_list: list[str] | None = None, processes: int | None = None, fva_scale: bool = True, seed: Generator | int | None = None, fva_kwargs: dict[str, Any] | None = None) DataFrame
Perform Corner Based sampling of a Metabolic Model
- Parameters:
model (cobra.Model) – The model to sample from
n_samples (int, default=1000) – The number of samples to generate
reaction_list (list[str], optional) – The set of reactions which could be selected to be a part of the objective during corner sampling (so, for example, you could remove pseudo reactions). Must be a list of reaction ids.
processes (int, optional) – The number of processes to use (note uses joblib, so can be managed via a joblib context)
fva_scale (bool, default=True) – Whether to scale the weights assigned to reactions in the randomized objectives by the maximum (absolute) flux value the associated reaction could achieve
seed (int or np.random.Generator, optional) – Optional seed to use for selection of reactions/weights. Note that this doesn’t garuntee the generated solutions will be the same, only that the objectives selected to generate each will be (so it will depend on solver consistancy if the samples are identical).
fva_kwargs (dict of str to Any) – Key word arguments passed to cobra.flux_analysis.flux_variability_analysis, by default this will have the fraction_of_optimum set to 0.0, so that the objective function doesn’t impact the sampling.
- Returns:
samples – A DataFrame of the generated samples, with shape (n_samples, n_reactions), and with columns named after the reaction ids in the model, and each row representing a random sample.
- Return type:
pd.DataFrame
Notes
Corner Based sampling iteratively creates a random objective function, and then optimizes it, storing the resulting flux distribution. It creates a random objective function by first selecting a value $ au$, which is the proportion of reactions that will be involved. Then, for it chooses a subset of reaction based on this proportion, and for each assigns a random weight between -1 and 1. These weights can be optionally scaled using flux variability analysis. The objective of the FBA problem is them set to be the weighted sum of the fluxes of the selected reactions (weighted by the randomly generated weights).
This method is based on the method discussed in “Adjusting for false discoveries in constraint-based differential metabolic flux analysis”, by Bruno G. Galuzzi, Luca Milazzo, and Chiara Damiani.
References
Galuzzi, B. G., Milazzo, L., & Damiani, C. (2024). Adjusting for false discoveries in constraint-based differential metabolic flux analysis. Journal of Biomedical Informatics, 150, 104597. https://doi.org/10.1016/j.jbi.2024.104597
Diagnostics
Submodule containing some diagnostics for convergence of sampling
- metworkpy.sampling.diagnostics.geweke(samples: DataFrame, first: float = 0.1, last: float = 0.5) Series
Compute the Geweke diagnostic for a set of samples
- Parameters:
samples (pd.DataFrame) – A DataFrame of samples, the columns should be the variables being sampled, and the rows should represent the samples (in the order they were acquired).
first (float,default=0.1) – First portion of the sampling chain
last (float, default=0.5) – Last portion of the sampling chain
Returns
pd.Series – The Geweke diagnostic for each column of the DataFrame, the index of the Series will match the columns of the input DataFrame
Upsampling
Submodule for upsampling from a set of points in a convex polytope
- metworkpy.sampling.upsampling.upsample(samples: DataFrame, n_samples: 1000, processes: int | None = None, seed: Generator | int | None = None) DataFrame
Perform upsampling of samples drawn from a convex polytope
- Parameters:
samples (pd.DataFrame) – Samples taken from a convex polytope, for example using the metworkpy.sampling.corner_sampling function.
n_samples (int, default=1000) – The number of additional samples to generate
processes (int, optional) – Number of processes for performing the calculations
seed (int or np.random.Generator, optional) – Seed for the random number generator performing the sampling
- Returns:
samples – A DataFrame of the original samples, with shape (n+n_samples, m) where the original samples DataFrame had shape (n,m). That is, it is the orignal DataFrame with the additional samples appended to the end. The column named remain unchanged.
- Return type:
pd.DataFrame
Notes
This method iteratively selects a random proportion of the samples in the original sample array, and treats these samples as corners and samples from the interior of the polytope they describe.