Utils Reference
API Documentation for the utils submodule
Connected Components
Module for finding the connected components of a graph
- class metworkpy.utils.connected_components.HashableComparable(*args, **kwargs)
Bases:
ProtocolProtocol for annotating Comparable and Hashable types.
- metworkpy.utils.connected_components.find_connected_components(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) list[set[HCT]]
Find the connected components of a graph
- Parameters:
node_list (list of Hashable) – List of the nodes which are in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph
- Returns:
connected_components – List of the connected components of the graph
- Return type:
list of sets of Hashable
- metworkpy.utils.connected_components.find_degree(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) dict[HCT, int]
Find the degree of nodes within a graph
- Parameters:
node_list (list of Hashable) – List of the nodes which are in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph
- Returns:
degree_dict – Dictionary keyed by node, with value corresponding to degree of the node
- Return type:
dict of Hashable to int
- metworkpy.utils.connected_components.find_neighbors(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) dict[HCT, set[HCT]]
Find the neighbors of nodes within a graph
- Parameters:
node_list (list of Hashable) – List of the nodes which are in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph
- Returns:
neighors – A dictionary keyed by node, with values of sets of neighbors to the key node
- Return type:
dict of Hashable to set of Hashable
- metworkpy.utils.connected_components.find_representative_nodes(node_list: list[HCT], edge_list: list[tuple[HCT, HCT]]) dict[HCT, set[HCT]]
Find representative nodes in a graph by selecting the node of highest degree from each component of the graph
- Parameters:
node_list (list of Hashable) – List of the nodes in the graph (nodes must be hashable)
edge_list (list of tuple of Hashable) – List of edges in the graph, in the form of a tuple of nodes which it connects
- Returns:
representative_dict – A dictionary with the representative nodes as the keys, and a set of the nodes they represent as the values
- Return type:
dict of Hashable to set of Hashable
Note
The sets of nodes in the returned dictionary will include the representative node
Expression
Module containing utility functions for working with gene expression data, and converting it into qualitative weights
- metworkpy.utils.expression_utils.count_to_cpm(count: DataFrame) DataFrame
Converts count data to counts per million
- Parameters:
count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows
- Returns:
CPM normalized counts
- Return type:
pd.DataFrame
- metworkpy.utils.expression_utils.count_to_fpkm(count: DataFrame, feature_length: Series) DataFrame
Converts count data to FPKM normalized expression
- Parameters:
count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows. Specifically, the count data represents the number of fragments, where a fragment corresponds to a single cDNA molecule, which can be represented by a pair of reads from each end.
feature_length (pd.Series) – Series containing the feature length for all the genes
- Returns:
FPKM normalized counts
- Return type:
pd.DataFrame
- metworkpy.utils.expression_utils.count_to_rpkm(count: DataFrame, feature_length: Series) DataFrame
Normalize raw count data using RPKM
- Parameters:
count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows
feature_length (pd.Series) – Series containing the feature length for all the genes
- Returns:
RPKM normalized counts
- Return type:
pd.DataFrame
- metworkpy.utils.expression_utils.count_to_tpm(count: DataFrame, feature_length: Series) DataFrame
Converts count data to TPM normalized expression
- Parameters:
count (pd.DataFrame) – Dataframe containing gene count data, with genes as the columns and samples as the rows
feature_length (pd.Series) – Series containing the feature length for all the genes
- Returns:
TPM normalized counts
- Return type:
pd.DataFrame
- metworkpy.utils.expression_utils.expr_to_imat_gene_weights(expression: ~pandas.core.series.Series | ~pandas.core.frame.DataFrame, quantile: float | tuple[float, float] = 0.15, aggregator: ~typing.Callable[[ArrayLike], float] = <function median>, subset: ~typing.Iterable | None = None, sample_axis: int | str = 0) Series
Convert gene expression data to qualitative gene weights
- Parameters:
expression (pd.Series | pd.DataFrame) – Normalized gene expression data. If it is a DataFrame representing multiple samples, those samples will be aggregated using the aggregator function (default median).
quantile (float | tuple[float, float]) – Quantile or quantiles to use for binning expression data. Should be between 0 and 1. If single value the bottom quantile will be converted to -1, the top quantile converted to 1, and all expression values between to 0. If a tuple is provided, the first is treated as the low quantile cutoff, and the second is treated as the high quantile cutoff.
aggregator (Callable[[np.ArrayLike], float]) – Function used to aggregated gene expression data across samples, only used if expression is a DataFrame (default median).
subset (Optional[Iterable]) – Subset of genes to perform calculations on. expression is filtered to only include these genes before quantiles are calculated. If any genes are present in the subset, but not in expression, they will be assigned a value of 0 following the trinarization.
sample_axis (int | str) – Which axis represents samples in the expression data (only used if expression is DataFrame). “index” or 0 if rows represent different samples, “column” or 1 if columns represent different samples (default is rows).
- Returns:
Series of qualitative weights, -1 for low expression, 1 for high expression, and 0 otherwise.
- Return type:
pd.Series
Notes
The expression data should only represent biological replicates as it will be aggregated. If multiple different conditions are represented in your expression data, they should be seperated before this function is used.
For the quantile, if a tuple like (0.15, 0.90) is provided, the bottom 15% of genes in terms of expression will have weights of -1, while the top 10% will have weights of 1, and everything in between will have weights of 0.
- metworkpy.utils.expression_utils.expr_to_metchange_gene_weights(expression: pd.Series | pd.DataFrame, quantile_cutoff: float, subset: Iterable[str] | None = None, aggregator: Callable[[ArrayLike[float]], float] = <function median>, sample_axis: str | int = 0) pd.Series
Convert gene expression values into metchange gene weights
- Parameters:
expression (pd.Series | pd.DataFrame) – Gene expression values. Either a series with genes as the index, or a Dataframe with genes as one axis, and samples as the other. In the case of a dataframe, the expression values are aggregated before the weights are calculated.
quantile_cutoff (float) – Cutoff used for defining the weights. The expression value corresponding to this quantile is used as the threshold. Everything above the threshold is weighted 0, and everything below is weighted in proportion to distance from the threshold. The weight will be between 0 and 1, with values near the threshold being near 0, and values near 0 being weighted 1.
subset (Iterable[str] | None) – Subset of genes to use in weighting. Default of None will use all genes in expression. If not none, expression will be filtered down to this subset of genes before the quantile threshold is calculated, and the returned series will only include this subset of genes.
aggregator (Callable[[Arraylike[float]], float]) – Aggregation function to use for aggregating expression data across multiple samples. Should accept a single Arraylike argument, and return a float. Default is median.
sample_axis (str | int) – Which axis in expression dataframe represents samples. Can be ‘index’, ‘columns’, 0 or 1. A value of 0 or ‘index’ means rows represent different samples, while a value of 1 or ‘columns’ means that columns represent different samples.
- Returns:
Series of gene weights (floats between 0 and 1, representing the probability that a gene product is absent), indexed by gene ids.
- Return type:
pd.Series
Notes
This does not convert the expression values into reaction weights, to do so metworkpy.parse.gpr.gene_to_rxn_weights can be used. The function dict will need to be altered from the default, with {‘AND’:max, ‘OR’:min} due to the metchange weights being probability of absense rather than presence.
- metworkpy.utils.expression_utils.fpkm_to_tpm(fpkm: DataFrame)
Convert FPKM normalized counts to TPM normalized counts
- Parameters:
fpkm (pd.DataFrame) – RPKM normalized count data, with genes as columns and samples as rows
- Returns:
TPM normalized counts
- Return type:
pd.DataFrame
- metworkpy.utils.expression_utils.rpkm_to_tpm(rpkm: DataFrame)
Convert RPKM normalized counts to TPM normalized counts
- Parameters:
rpkm (pd.DataFrame) – RPKM normalized count data, with genes as columns and samples as rows
- Returns:
TPM normalized counts
- Return type:
pd.DataFrame
Models
Module for model utilities
- metworkpy.utils.models.model_bounds_eq(model1: Model, model2: Model, **kwargs) bool
Check if the bounds of two models are equal
- Parameters:
model1 (cobra.Model) – First model to compare
model2 (cobra.Model) – Second model to compare
**kwargs (dict[str, Any]) – Additional keyword arguments passed to numpy isclose function to check equality
- Returns:
True if the model bounds are equal, false otherwise
- Return type:
bool
- metworkpy.utils.models.model_eq(model1: Model, model2: Model, verbose: bool = False) bool
Check if two cobra models are equal.
- Parameters:
model1 (cobra.Model) – The first model to compare.
model2 (cobra.Model) – The second model to compare.
verbose (bool) – Whether to print where the models differ (default: False).
- Returns:
True if the models are equal, False otherwise.
- Return type:
bool
- metworkpy.utils.models.read_model(model_path: str | Path, file_type: str | None = None)
Read a model from a file
- Parameters:
model_path (str | pathlib.Path) – Path to the model file
file_type (str | None) – Type of the file
- Returns:
The model
- Return type:
unknown
- metworkpy.utils.models.write_model(model: Model, model_path: str | Path, file_type: str | None = None)
Write a model to a file
- Parameters:
model (cobra.Model) – Model to write
model_path (str | pathlib.Path) – Path to the model file
file_type (str|None) – Type of the file
- Returns:
Nothing
- Return type:
unknown
Permutation Testing
Functions for performing permutation tests
- metworkpy.utils.permutation.permutation_test(dataset1: ndarray, dataset2: ndarray, statistic: Callable[[ndarray, ndarray], float], axis: int = 0, permutation_type: Literal['independent', 'pairings'] = 'independent', n_resamples=500, alternative: Literal['less', 'greater', 'two-sided'] = 'two-sided', estimation_method: Literal['kernel', 'empirical'] = 'empirical', rng: Generator | int | None = None) Tuple[float, float]
Perform a permutation test for a sample statistic
- Parameters:
dataset1 (np.ndarray) – The two datasets to perform the permutation testing on, must have broadcastable shapes except along axis
dataset2 (np.ndarray) – The two datasets to perform the permutation testing on, must have broadcastable shapes except along axis
statistic (Callable) – Function which takes two numpy arrays (which have the same shape except along axis), and returns a float
axis (int, default=0) – The sample axis for the two datsets
permutation_type ({'independent', 'pairings'}, default='independent') –
The type of permutation to perform,
pairings: Shuffles which observations are paired, but the assignment of observation to sample isn’t changed
independent: Shuffles which samples observations are assigned to
n_resamples (int, default=500) – The number of permutations to perform
alternative ({"less", "greater", "two-sided"}, default='two-sided') – Alternative hypothesis
estimation_method ({"kernel", "empirical"}, default="empirical") – Method to use for estimating p-value, either an empirical estimate, or a gaussian_kde. The empirical method returns an upper bound on the p-value that is somewhat conservative, and is based on [1] and the implementation in SciPy.
rng (np.random.Generator or int, Optional) – A numpy random generator to use for sampling, or an int to seed the default generator.
- Returns:
Tuple of the sample statistic and the calculated p-value
- Return type:
tuple of float,float
References
Statistics
Some helpful statistics methods
- class metworkpy.utils.statistics.MannWhitneyUResult(u1: float, u2: float, auc_roc: float, pvalue: float)
Bases:
NamedTupleClass for the results of the extended Mann-Whitney U-test, which includes the U1,U2, AUC ROC, and p-value
- auc_roc: float
Alias for field number 2
- pvalue: float
Alias for field number 3
- u1: float
Alias for field number 0
- u2: float
Alias for field number 1
- class metworkpy.utils.statistics.SignificanceResult(statistic: float, pvalue: float)
Bases:
NamedTupleClass for return values from significance tests
- pvalue: float
Alias for field number 1
- statistic: float
Alias for field number 0
- metworkpy.utils.statistics.extended_mannwhitneyu_test(x: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], alternative: Literal['two-sided', 'less', 'greater'] = 'two-sided', axis: int = 0, **kwargs) MannWhitneyUResult
Perform a Mann-Whitney U-test and calculate additional information about the result, specifically U2 and AUC ROC
- Parameters:
x (np.ArrayLike) – N-d array of samples. Arrays must be broadcastable except along the dimension given by axis
y (np.ArrayLike) – N-d array of samples. Arrays must be broadcastable except along the dimension given by axis
alternative ({'two-sided', 'less', 'greater'}) – The alternative hypothesis to evaluate
axis (int, default=0) – The axis of the input along which to compute the statistic
kwargs – Keyword arguments to pass to the scipy.stats.mannwhitneyu function
- Returns:
The results of the Mann-Whitney U-test with the addition of u2 and AUC ROC
- Return type:
- metworkpy.utils.statistics.fisher_enrichment(group1: set[Hashable], group2: set[Hashable], total_count: int, alternative: Literal['two-sided', 'less', 'greater']) SignificanceResult
Perform enrichment analysis using the Fisher Exact Test
- Parameters:
group1 (set of Hashable) – The groups to evaluate the significance of the overlap for
group2 (set of Hashable) – The groups to evaluate the significance of the overlap for
total_count (int) – The size of the set from which group1 and group2 are subsets
alternative ({'two-sided', 'less', 'greater'}) – The alternative hypothesis, see SciPy’s fisher_exact for details.
Results
-------
SignificanceResult – Named tuple of statistic and p-value for the result of the Fisher’s exact test
Translation
Module for translating between genes and reactions
- metworkpy.utils.translate.gene_to_reaction_ids(model: Model, gene: str, essential: bool = False) set[str]
Convert a gene id into associated reaction ids
- Parameters:
model (cobra.Model) – Model to use for performing the translation
gene (str) – id for gene to translate into reaciton ids
essential (bool, default=False) – Whether the reactions should only be those for which the gene is required
- Returns:
reaction_set – Set of reactions associated with a particular gene
- Return type:
set[str]
- metworkpy.utils.translate.gene_to_reaction_list(model: Model, gene_list: Iterable[str], essential: bool = False)
Convert a list (or other Iterable) of gene ids into a list of associated reaction ids
- Parameters:
model (cobra.Model) – Model to use for performing the translation
gene_list (list of str) – list of gene ids to translate
essential (bool, default=False) – Whether the reactions should only be those for which the genes are required
- Returns:
reaction_list – list of reactions associated with the genes in gene_list
- Return type:
list[str]
Note
The order of the genes is not perseved in the order of the reactions
- metworkpy.utils.translate.get_gene_to_reaction_translation_dict(model: Model, essential: bool = False) dict[str, set[str]]
Get a dictionary to translate from genes to associated reactions
- Parameters:
model (cobra.Model) – Model to construct the translation dict for
essential (bool, default=False) – Whether the reactions should only be those for which the gene is required
- Returns:
translation_dict – Dict keyed by gene ids within the model, with values that are sets of reactions associated with the gene
- Return type:
dict[str, set[str]]
- metworkpy.utils.translate.get_reaction_to_gene_translation_dict(model: Model, essential: bool = False) dict[str, set[str]]
Get a dictionary to translate from reactions to associated genes
- Parameters:
model (cobra.Model) – Model to construct the translation dict for
essential (bool, default=False) – Whether the genes should only be those which are required by the reaction
- Returns:
translation_dict – Dict keyed by reaction ids within the model, with values that are sets of genes associated with each reaction
- Return type:
dict[str, set[str]]
- metworkpy.utils.translate.reaction_to_gene_ids(model: Model, reaction: str, essential: bool = False) set[str]
Convert a reaction id into associated gene ids
- Parameters:
model (cobra.Model) – Model to use for perfoming the translation
reaction (str) – id of reaction to translate into gene ids
essential (bool, default=False) – Whether the genes should be only those required for the reaction to function
- Returns:
gene_set – Set of genes associated with a particular reaction
- Return type:
set[str]
- metworkpy.utils.translate.reaction_to_gene_list(model: Model, reaction_list: Iterable[str], essential: bool = False)
Convert a list (or other Iterable) of reaction ids into a list of associated gene ids
- Parameters:
model (cobra.Model) – Model to use for performing the translation
reaction_list (list of str) – list of reaction ids to translate
essential (bool, default=False) – Whether the genes should only be those which the reactions require to function
- Returns:
gene_list – list of genes associated with the reactions in reaction_list
- Return type:
list[str]
Note
The order of the reactions is not perseved in the order of the genes