Utils Reference

API Documentation for the utils submodule

Connected Components

Module for finding the connected components of a graph

class metworkpy.utils.connected_components.HashableComparable(*args, **kwargs)

Bases: Protocol

Protocol for annotating Comparable and Hashable types.

metworkpy.utils.connected_components.find_connected_components(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) → list[set[HCT]]

Find the connected components of a graph

Parameters:

node_list (list of Hashable) – List of the nodes which are in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph

Returns:

connected_components – List of the connected components of the graph

Return type:

list of sets of Hashable

metworkpy.utils.connected_components.find_degree(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) → dict[HCT, int]

Find the degree of nodes within a graph

Parameters:

node_list (list of Hashable) – List of the nodes which are in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph

Returns:

degree_dict – Dictionary keyed by node, with value corresponding to degree of the node

Return type:

dict of Hashable to int

metworkpy.utils.connected_components.find_neighbors(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) → dict[HCT, set[HCT]]

Find the neighbors of nodes within a graph

Parameters:

node_list (list of Hashable) – List of the nodes which are in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph

Returns:

neighors – A dictionary keyed by node, with values of sets of neighbors to the key node

Return type:

dict of Hashable to set of Hashable

metworkpy.utils.connected_components.find_representative_nodes(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) → dict[HCT, set[HCT]]

Find representative nodes in a graph by selecting the node of highest degree from each component of the graph

Parameters:

node_list (list of Hashable) – List of the nodes in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph, in the form of a tuple of nodes which it connects

Returns:

representative_dict – A dictionary with the representative nodes as the keys, and a set of the nodes they represent as the values

Return type:

dict of Hashable to set of Hashable

Note

The sets of nodes in the returned dictionary will include the representative node

Expression

Module containing utility functions for working with gene expression data, and converting it into qualitative weights

metworkpy.utils.expression_utils.count_to_cpm(count: DataFrame) → DataFrame

Converts count data to counts per million

Parameters:: count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows
Returns:: CPM normalized counts
Return type:: pd.DataFrame

metworkpy.utils.expression_utils.count_to_fpkm(count: DataFrame, feature_length: Series) → DataFrame

Converts count data to FPKM normalized expression

Parameters:

count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows. Specifically, the count data represents the number of fragments, where a fragment corresponds to a single cDNA molecule, which can be represented by a pair of reads from each end.
feature_length (pd.Series) – Series containing the feature length for all the genes

Returns:

FPKM normalized counts

Return type:

pd.DataFrame

metworkpy.utils.expression_utils.count_to_rpkm(count: DataFrame, feature_length: Series) → DataFrame

Normalize raw count data using RPKM

Parameters:

count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows
feature_length (pd.Series) – Series containing the feature length for all the genes

Returns:

RPKM normalized counts

Return type:

pd.DataFrame

metworkpy.utils.expression_utils.count_to_tpm(count: DataFrame, feature_length: Series) → DataFrame

Converts count data to TPM normalized expression

Parameters:

count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows
feature_length (pd.Series) – Series containing the feature length for all the genes

Returns:

TPM normalized counts

Return type:

pd.DataFrame

metworkpy.utils.expression_utils.expr_to_imat_gene_weights(expression: ~pandas.core.series.Series | ~pandas.core.frame.DataFrame, quantile: float | tuple[float, float] = 0.15, aggregator: ~typing.Callable[[ArrayLike], float] = <function median>, subset: ~typing.Iterable | None = None, sample_axis: int | str = 0) → Series

Convert gene expression data to qualitative gene weights

Parameters:

expression (pd.Series | pd.DataFrame) – Normalized gene expression data. If it is a DataFrame representing multiple samples, those samples will be aggregated using the aggregator function (default median).
quantile (float | tuple[float, float]) – Quantile or quantiles to use for binning expression data. Should be between 0 and 1. If single value the bottom quantile will be converted to -1, the top quantile converted to 1, and all expression values between to 0. If a tuple is provided, the first is treated as the low quantile cutoff, and the second is treated as the high quantile cutoff.
aggregator (Callable[[np.ArrayLike], float]) – Function used to aggregated gene expression data across samples, only used if expression is a DataFrame (default median).
subset (Optional[Iterable]) – Subset of genes to perform calculations on. expression is filtered to only include these genes before quantiles are calculated. If any genes are present in the subset, but not in expression, they will be assigned a value of 0 following the trinarization.
sample_axis (int | str) – Which axis represents samples in the expression data (only used if expression is DataFrame). “index” or 0 if rows represent different samples, “column” or 1 if columns represent different samples (default is rows).

Returns:

Series of qualitative weights, -1 for low expression, 1 for high expression, and 0 otherwise.

Return type:

pd.Series

Notes

The expression data should only represent biological replicates as it will be aggregated. If multiple different conditions are represented in your expression data, they should be seperated before this function is used.

For the quantile, if a tuple like (0.15, 0.90) is provided, the bottom 15% of genes in terms of expression will have weights of -1, while the top 10% will have weights of 1, and everything in between will have weights of 0.

metworkpy.utils.expression_utils.expr_to_metchange_gene_weights(expression: pd.Series | pd.DataFrame, quantile_cutoff: float, subset: Iterable[str] | None = None, aggregator: Callable[[ArrayLike[float]], float] = <function median>, sample_axis: str | int = 0) → pd.Series

Convert gene expression values into metchange gene weights

Parameters:

expression (pd.Series | pd.DataFrame) – Gene expression values. Either a series with genes as the index, or a Dataframe with genes as one axis, and samples as the other. In the case of a dataframe, the expression values are aggregated before the weights are calculated.
quantile_cutoff (float) – Cutoff used for defining the weights. The expression value corresponding to this quantile is used as the threshold. Everything above the threshold is weighted 0, and everything below is weighted in proportion to distance from the threshold. The weight will be between 0 and 1, with values near the threshold being near 0, and values near 0 being weighted 1.
subset (Iterable[str] | None) – Subset of genes to use in weighting. Default of None will use all genes in expression. If not none, expression will be filtered down to this subset of genes before the quantile threshold is calculated, and the returned series will only include this subset of genes.
aggregator (Callable[[Arraylike[float]], float]) – Aggregation function to use for aggregating expression data across multiple samples. Should accept a single Arraylike argument, and return a float. Default is median.
sample_axis (str | int) – Which axis in expression dataframe represents samples. Can be ‘index’, ‘columns’, 0 or 1. A value of 0 or ‘index’ means rows represent different samples, while a value of 1 or ‘columns’ means that columns represent different samples.

Returns:

Series of gene weights (floats between 0 and 1, representing the probability that a gene product is absent), indexed by gene ids.

Return type:

pd.Series

Notes

This does not convert the expression values into reaction weights, to do so metworkpy.parse.gpr.gene_to_rxn_weights can be used. The function dict will need to be altered from the default, with {‘AND’:max, ‘OR’:min} due to the metchange weights being probability of absense rather than presence.

metworkpy.utils.expression_utils.fpkm_to_tpm(fpkm: DataFrame)

Convert FPKM normalized counts to TPM normalized counts

Parameters:: fpkm (pd.DataFrame) – RPKM normalized count data, with genes as columns and samples as rows
Returns:: TPM normalized counts
Return type:: pd.DataFrame

metworkpy.utils.expression_utils.rpkm_to_tpm(rpkm: DataFrame)

Convert RPKM normalized counts to TPM normalized counts

Parameters:: rpkm (pd.DataFrame) – RPKM normalized count data, with genes as columns and samples as rows
Returns:: TPM normalized counts
Return type:: pd.DataFrame

Models

Module for model utilities

metworkpy.utils.models.model_bounds_eq(model1: Model, model2: Model, **kwargs) → bool

Check if the bounds of two models are equal

Parameters:

model1 (cobra.Model) – First model to compare
model2 (cobra.Model) – Second model to compare
**kwargs (dict[str, Any]) – Additional keyword arguments passed to numpy isclose function to check equality

Returns:

True if the model bounds are equal, false otherwise

Return type:

bool

metworkpy.utils.models.model_eq(model1: Model, model2: Model, verbose: bool = False) → bool

Check if two cobra models are equal.

Parameters:

model1 (cobra.Model) – The first model to compare.
model2 (cobra.Model) – The second model to compare.
verbose (bool) – Whether to print where the models differ (default: False).

Returns:

True if the models are equal, False otherwise.

Return type:

bool

metworkpy.utils.models.read_model(model_path: str | Path, file_type: str | None = None)

Read a model from a file

Parameters:

model_path (str | pathlib.Path) – Path to the model file
file_type (str | None) – Type of the file

Returns:

The model

Return type:

unknown

metworkpy.utils.models.write_model(model: Model, model_path: str | Path, file_type: str | None = None)

Write a model to a file

Parameters:

model (cobra.Model) – Model to write
model_path (str | pathlib.Path) – Path to the model file
file_type (str|None) – Type of the file

Returns:

Nothing

Return type:

unknown

Permutation Testing

Functions for performing permutation tests

metworkpy.utils.permutation.permutation_test(dataset1: ndarray, dataset2: ndarray, statistic: Callable[[ndarray, ndarray], float], axis: int = 0, permutation_type: Literal['independent', 'pairings'] = 'independent', n_resamples=500, alternative: Literal['less', 'greater', 'two-sided'] = 'two-sided', estimation_method: Literal['kernel', 'empirical'] = 'empirical', rng: Generator | int | None = None) → Tuple[float, float]

Perform a permutation test for a sample statistic

Parameters:

dataset1 (np.ndarray) – The two datasets to perform the permutation testing on, must have broadcastable shapes except along axis
dataset2 (np.ndarray) – The two datasets to perform the permutation testing on, must have broadcastable shapes except along axis
statistic (Callable) – Function which takes two numpy arrays (which have the same shape except along axis), and returns a float
axis (int, default=0) – The sample axis for the two datsets
permutation_type ({'independent', 'pairings'}, default='independent') –
The type of permutation to perform,
- pairings: Shuffles which observations are paired, but the assignment of observation to sample isn’t changed
- independent: Shuffles which samples observations are assigned to
n_resamples (int, default=500) – The number of permutations to perform
alternative ({"less", "greater", "two-sided"}, default='two-sided') – Alternative hypothesis
estimation_method ({"kernel", "empirical"}, default="empirical") – Method to use for estimating p-value, either an empirical estimate, or a gaussian_kde. The empirical method returns an upper bound on the p-value that is somewhat conservative, and is based on [1] and the implementation in SciPy.
rng (np.random.Generator or int, Optional) – A numpy random generator to use for sampling, or an int to seed the default generator.

Returns:

Tuple of the sample statistic and the calculated p-value

Return type:

tuple of float,float

References

Statistics

Some helpful statistics methods

class metworkpy.utils.statistics.MannWhitneyUResult(u1: float, u2: float, auc_roc: float, pvalue: float)

Bases: NamedTuple

Class for the results of the extended Mann-Whitney U-test, which includes the U1,U2, AUC ROC, and p-value

auc_roc: float: Alias for field number 2

pvalue: float: Alias for field number 3

u1: float: Alias for field number 0

u2: float: Alias for field number 1

class metworkpy.utils.statistics.SignificanceResult(statistic: float, pvalue: float)

Bases: NamedTuple

Class for return values from significance tests

pvalue: float: Alias for field number 1

statistic: float: Alias for field number 0

Perform a Mann-Whitney U-test and calculate additional information about the result, specifically U2 and AUC ROC

Parameters:

x (np.ArrayLike) – N-d array of samples. Arrays must be broadcastable except along the dimension given by axis
y (np.ArrayLike) – N-d array of samples. Arrays must be broadcastable except along the dimension given by axis
alternative ({'two-sided', 'less', 'greater'}) – The alternative hypothesis to evaluate
axis (int, default=0) – The axis of the input along which to compute the statistic
kwargs – Keyword arguments to pass to the scipy.stats.mannwhitneyu function

Returns:

The results of the Mann-Whitney U-test with the addition of u2 and AUC ROC

Return type:

MannWhitneyUResult

metworkpy.utils.statistics.fisher_enrichment(group1: set[Hashable], group2: set[Hashable], total_count: int, alternative: Literal['two-sided', 'less', 'greater']) → SignificanceResult

Perform enrichment analysis using the Fisher Exact Test

Parameters:

group1 (set of Hashable) – The groups to evaluate the significance of the overlap for
group2 (set of Hashable) – The groups to evaluate the significance of the overlap for
total_count (int) – The size of the set from which group1 and group2 are subsets
alternative ({'two-sided', 'less', 'greater'}) – The alternative hypothesis, see SciPy’s fisher_exact for details.
Results
-------
SignificanceResult – Named tuple of statistic and p-value for the result of the Fisher’s exact test

Translation

Module for translating between genes and reactions

metworkpy.utils.translate.gene_to_reaction_ids(model: Model, gene: str, essential: bool = False) → set[str]

Convert a gene id into associated reaction ids

Parameters:

model (cobra.Model) – Model to use for performing the translation
gene (str) – id for gene to translate into reaciton ids
essential (bool, default=False) – Whether the reactions should only be those for which the gene is required

Returns:

reaction_set – Set of reactions associated with a particular gene

Return type:

set[str]

metworkpy.utils.translate.gene_to_reaction_list(model: Model, gene_list: Iterable[str], essential: bool = False)

Convert a list (or other Iterable) of gene ids into a list of associated reaction ids

Parameters:

model (cobra.Model) – Model to use for performing the translation
gene_list (list of str) – list of gene ids to translate
essential (bool, default=False) – Whether the reactions should only be those for which the genes are required

Returns:

reaction_list – list of reactions associated with the genes in gene_list

Return type:

list[str]

Note

The order of the genes is not perseved in the order of the reactions

metworkpy.utils.translate.get_gene_to_reaction_translation_dict(model: Model, essential: bool = False) → dict[str, set[str]]

Get a dictionary to translate from genes to associated reactions

Parameters:

model (cobra.Model) – Model to construct the translation dict for
essential (bool, default=False) – Whether the reactions should only be those for which the gene is required

Returns:

translation_dict – Dict keyed by gene ids within the model, with values that are sets of reactions associated with the gene

Return type:

dict[str, set[str]]

metworkpy.utils.translate.get_reaction_to_gene_translation_dict(model: Model, essential: bool = False) → dict[str, set[str]]

Get a dictionary to translate from reactions to associated genes

Parameters:

model (cobra.Model) – Model to construct the translation dict for
essential (bool, default=False) – Whether the genes should only be those which are required by the reaction

Returns:

translation_dict – Dict keyed by reaction ids within the model, with values that are sets of genes associated with each reaction

Return type:

dict[str, set[str]]

metworkpy.utils.translate.reaction_to_gene_ids(model: Model, reaction: str, essential: bool = False) → set[str]

Convert a reaction id into associated gene ids

Parameters:

model (cobra.Model) – Model to use for perfoming the translation
reaction (str) – id of reaction to translate into gene ids
essential (bool, default=False) – Whether the genes should be only those required for the reaction to function

Returns:

gene_set – Set of genes associated with a particular reaction

Return type:

set[str]

metworkpy.utils.translate.reaction_to_gene_list(model: Model, reaction_list: Iterable[str], essential: bool = False)

Convert a list (or other Iterable) of reaction ids into a list of associated gene ids

Parameters:

model (cobra.Model) – Model to use for performing the translation
reaction_list (list of str) – list of reaction ids to translate
essential (bool, default=False) – Whether the genes should only be those which the reactions require to function

Returns:

gene_list – list of genes associated with the reactions in reaction_list

Return type:

list[str]

Note

The order of the reactions is not perseved in the order of the genes