Compute betweenness centrality for a subset of
nodes on a bipartite network, where the node subset
comes from one of the partitions and nodes in the
other partitions are treated as edges
where $T$ is the set of targets, $sigma(s, t)$ is the number of
shortest $(s, t)$-paths, and $sigma(s, t|v)$ is the number of
those paths passing through some node $v$ other than $s, t$.
If $s = t$, $sigma(s, t) = 1$, and if $v in {s, t}$,
$sigma(s, t|v) = 0$ [2]__.
The betweenness can also be further normalized to
the number of possible pairs of s and t.
Parameters:
G (graph) – A NetworkX graph, should be a bipartite graph (this condition is not checked).
node_partition (Iterable[Hashable]) – One of the two sets of nodes in the bipartite graph, specifically
the set which contains all the targets
targets (list of nodes, optional) – Nodes to use as sources/targets for shortest paths in betweenness,
all of these should fall into a single partition of the bipartite
graph (this condition is not checked). If None, uses all nodes
in the node_partition
normalized (bool, optional) – If True the betweenness values are normalized by $2/((n-1)(n-2))$
for graphs, and $1/((n-1)(n-2))$ for directed graphs where $n$
is the number of nodes in targets.
Returns:
nodes – Dictionary of nodes with betweenness centrality as the value. This
includes betweenness values for all the nodes in the Graph
(in both sets of the partition).
Return type:
dictionary
Notes
The basic algorithm is from [1]__.
The total number of paths between source and target is counted
differently for directed and undirected graphs. Directed paths
are easy to count. Undirected paths are tricky: should a path
from “u” to “v” count as 1 undirected path or as 2 directed paths?
For betweenness_centrality we report the number of undirected
paths when G is undirected.
For betweenness_centrality_subset the reporting is different.
If the source and target subsets are the same, then we want
to count undirected paths. But if the source and target subsets
differ – for example, if sources is {0} and targets is {1},
then we are only counting the paths in one direction. They are
undirected paths but we are counting them in a directed way.
To count them as undirected paths, each should count as half a path.
where $T$ is the set of targets,
$sigma(s, t)$ is the number of shortest $(s, t)$-paths,
and $sigma(s, t|v)$ is the number of those paths
passing through some node $v$ other than $s, t$.
If $s = t$, $sigma(s, t) = 1$,
and if $v in {s, t}$, $sigma(s, t|v) = 0$ [2]__.
The normalization is slightly different from NetworkX,
as it normalizes only to the possible (s,t) pairs in targets,
rather than to all possible (s,t) pairs in the network.
Parameters:
G (graph) – A NetworkX graph.
targets (list of nodes) – Nodes to use as sources/targets for shortest paths in betweenness
normalized (bool, optional) – If True the betweenness values are normalized by $2/((n-1)(n-2))$
for graphs, and $1/((n-1)(n-2))$ for directed graphs where $n$
is the number of nodes in targets.
weight (None or string, optional (default=None)) – If None, all edge weights are considered equal.
Otherwise holds the name of the edge attribute used as weight.
Weights are used to calculate weighted shortest paths, so they are
interpreted as distances.
Returns:
nodes – Dictionary of nodes with betweenness centrality as the value.
Return type:
dictionary
Notes
The basic algorithm is from [1]__.
For weighted graphs the edge weights must be greater than zero.
Zero edge weights can produce an infinite number of equal length
paths between pairs of nodes.
The normalization might seem a little strange but it is
designed to make betweenness_centrality(G) be the same as
betweenness_centrality_subset(G,sources=G.nodes(),targets=G.nodes()).
The total number of paths between source and target is counted
differently for directed and undirected graphs. Directed paths
are easy to count. Undirected paths are tricky: should a path
from “u” to “v” count as 1 undirected path or as 2 directed paths?
Compute closeness centrality for nodes, considering only paths
to a subset of other nodes.
Subset closeness centrality, based on closeness centrality [1]__, of a
node u is the reciprocal of the avergage shortest path distance
to u over all n-1 reachable nodes which are in targets
where d(v, u) is the shortest-path distance between v and u, where
v is in targets, and n-1 is the number of targets reachable from u.
Notice that the closeness distance function computes the incoming
distance to u for directed graphs. To use outward distance, act
on G.reverse().
Notice that higher values of closeness indicate higher centrality.
Wasserman and Faust propose an improved formula for graphs with
more than one connected component. The result is “a ratio of the
fraction of actors in the group who are reachable, to the average
distance” from the reachable actors [2]__. You might think this
scale factor is inverted but it is not. As is, nodes from small
components receive a smaller closeness value. Letting N denote
the number of nodes in the graph,
targets (list of nodes, optional) – The nodes to use as targets for the shortest paths in closeness
u (node, optional) – Return only the value for node u
distance (edge attribute key, optional (default=None)) – Use the specified edge attribute as the edge distance in shortest
path calculations. If None (the default) all edges have a distance of 1.
Absent edge attributes are assigned a distance of 1. Note that no check
is performed to ensure that edges have the provided attribute.
wf_improved (bool, optional (default=True)) – If True, scale by the fraction of nodes reachable. This gives the
Wasserman and Faust improved formula. For single component graphs
it is the same as the original formula.
Returns:
nodes – Dictionary of nodes with closeness centrality as the value.
The closeness centrality is normalized to (n-1)/(|T|-1) where
n is the number of targets in the connected part of graph
containing the node, and |T| is the total number of targets. If the graph
is not completely connected, this algorithm computes the closeness centrality
for each connected part separately scaled by the number of targets in that parts.
If the ‘distance’ keyword is set to an edge attribute key then the
shortest-path length will be computed using Dijkstra’s algorithm with
that edge attribute as the edge weight.
The closeness centrality uses inward distance to a node, not outward.
If you want to use outword distances apply the function to G.reverse()
Identify the variable components in the metabolic network,
that is the components of the network which can vary under at the
optimum solution
Parameters:
model (cobra.Model) – Model to find the variable components in
network (nx.Graph or nx.DiGraph, optional) – A metabolic network graph constructed from model,
used to find the connected components after removing reactions
which can’t vary under the optimal solution
tolerance (float, default=1e-7) – The tolerance, reactions which have minimum and maximum fluxes less
than this value will be considered constant
directed (bool, default=False) – If network is not passed, this decides if the constructed network is
directed or not
strongly_connected (bool, default=False) – Whether to find the strongly connected components of the
graph (only used if the provided network is directed)
kwargs – Keyword arguments are passed to cobra.flux_analysis.flux_variability_analysis
Returns:
List of sets of nodes in the metabolic network, each node
represents a variable component of the model at optimum
Return type:
list of set of nodes
Notes
Uses the cobra Model to find the reactions which are constant across
optimal solutions, and then identifies the connected groups of variable
reactions and associated metabolites
Find the clusters within a network with high target density
Parameters:
network (nx.Graph | nx.DiGraph) – Network to find clusters from
targets (list | dict | pd.Series) – Targets to find density of. Can be a list of nodes or genes, in
which case all targets will have equal weight, or a dict or
Series keyed by nodes/genes in the network which can specify
a target weight. If a dict or Series, values should be ints
or floats.
radius (int) – Radius to use for finding density. Specifies how far out from a
given node targets are counted towards density. A radius of 0
only counts the single node, and so will just return the
targets values back unchanged. Default value of 3.
top_quantile_cutoff (float) – Quantile cutoff for defining high density, the nodes within the
top 100*`quantile`% of label density are considered high
density. So a top_quantile_cutoff of 0.2 means that the top
20% of mode dense nodes will be defined as high density. Must be
between 0 and 1.
target_type ({'genes', 'nodes'}, default='nodes') – The type of targets, with ‘genes’ indicating the targets are
genes (which will require that a COBRApy model is provided as a kwarg,
i.e. model=model), and so gene target density will be used. If ‘nodes’,
then the targets should be nodes in the network.
kwargs – Passed to node_target_density, or gene_target_density functions
depending on target_type
Returns:
A dataframe indexed by node id, with columns for density and
cluster. The clusters are assigned integers starting from 0 to
differentiate them. The clusters are not ordered, and so multiple
calls to this method can results in different labels for the clusters.
Return type:
pd.DataFrame
Notes
This method finds the target density of the metabolic graph, and then identifies
nodes with a high target density in their neighborhoods. Nodes without a high
target densit are dropped from the graph, and then the connected components of
the graph are then used as the high density clusters.
Determine the density of gene targets in the neighborhood of a nodes
within a metabolic network
Parameters:
metabolic_network (nx.Graph or nx.DiGraph) – Metabolic network in the form of a reaction network, can be
directed or undirected, but directed graphs will be converted
to undirected.
metabolic_model (cobra.Model) – Metabolic model from which the metabolic network was constructed
gene_targets (pd.Series or list or dict) – Targets/counts of targets for genes associated with reactions in the
metabolic network. If a list each value should be a gene id, and will
have equal weight. If a dict, should be keyed by gene id, with values
corresponding to weight. If a pd.Series, should be indexed by gene id,
with values corresponding to weight.
nodes (iterable of hashable, optional) – Subset of nodes to find the density for, if not provided defaults
to all of the nodes in the network
radius (int, default=3) – The radius to use for finding density, specifies how far out from
a given node targets are counted towards density. A radius of 0 only
counts the genes associated with the single node.
essential (bool) – Whether for a gene to be in a neighborhood it should be
essential for at least 1 reaction in that neighborhood. If
False, all genes associated with reactions within the radius
are counted as in the neighborhood. If True, only genes
which are required for at least 1 reaction within the radius
are counted as in the neighborhood.
processes (int, optional) – Number of processes to use
Returns:
target_density – Pandas series with index corresponding to reactions in the network,
and values corresponding to the density of gene targets in the
neighborhood of that reaction node
Determine the enrichment of gene targets in the neighborhood of a reaction
within a metabolic network
Parameters:
metabolic_network (nx.Graph or nx.DiGraph) – Metabolic network in the form of a reaction network, can be
directed or undirected, but directed graphs will be converted
to undirected.
metabolic_model (cobra.Model) – Metabolic model from which the metabolic network was constructed
gene_targets (list or set of str) – Targeted genes associated with reactions in the
metabolic network. Result will be the enrichment in these targeted
genes in a neighborhood of each reaction in the network
nodes (iterable of hashable, optional) – Subset of nodes to find the enrichment for, if not provided defaults
to all of the nodes in the network
metric ("odds-ratio" or "p-value", default="p-value") – The enrichment metric to return in the Series, either the odds-ratio
or the p-value (default) of the Fisher’s exact test used to
evaluate enrichment
alternative ("two-sided", "less", or "greater") – The alternative hypothesis for the Fisher’s exact test used to
evaluate the enrichment
radius (int, default=3) – The radius to use for defining a neighborhood around the reaction for
finding enrichment, specifies how far out from a given node targets are
counted towards enrichment. A radius of 0 only counts the genes
associated with the single node.
essential (bool) – Whether for a gene to be in a neighborhood it should be
essential for at least 1 reaction in that neighborhood. If
False, all genes associated with reactions within the radius
are counted as in the neighborhood. If True, only genes
which are required for at least 1 reaction within the radius
are counted as in the neighborhood.
processes (int, optional) – Number of processes to use
Returns:
target_enrichment – Pandas series with index corresponding to reactions in the network,
and values corresponding to either the odds-ratio or the enrichment
p-value (depending on the value of metric)
Find the target density for different nodes in the graph. See note for
details.
Parameters:
network (nx.DiGraph | nx.Graph) – Networkx network (directed or undirected) to find the target
density of. Directed graphs are converted to undirected, and
edge weights are currently ignored.
targets (list | dict | pd.Series) – Targets to find density of. Can be a list of nodes in the network
where are targeted nodes will be treated equally, or a dict or
Series keyed by nodes in the network which can specify a target
weight (such as multiple targets for a single node). If a dict or
Series, values should be ints or floats.
nodes (iterable of hashable, optional) – Subset of nodes to find the density for, if not provided defaults
to all of the nodes in the network
radius (int) – Radius to use for finding density. Specifies how far out from a
given node targets are counted towards density. A radius of 0
only counts the single node, and so will just return the
targets values back unchanged. Default value of 3.
node_filter (Callable of node id to bool, or set of node id, optional) – Filter nodes in the network to consider when calculating density.
If a Callable, should take node ids as the only argument and return
a bool, if True the node will be considered in the density,
if False it will not be. If a set, only nodes in the set will be considered
when calculating density. Note that the density is still calculated for
all nodes, but nodes that are not in the filter won’t count towards the
size of the neighborhoods, and won’t be checked for being in the target
set.
processes (int, optional) – Number of processes to use for finding the density
Returns:
The target density for the nodes in the network
Return type:
pd.Series
Notes
For each node in a network, neighboring nodes up to a distance of radius
away are checked for targets. The total number of targets, or the sum of the
targets found (in the case of dict or Series input) divided by the number of nodes
within that radius is the density for a particular node.
Protocol for fuzzy membership functions, which should take a reaction,
a network, a set of target genes, and a reaction to gene set mapping,
and return a float between 0 and 1. This method can also take in
additional parameters as kwargs, which will be passed through form
the calling functions.
Converts gene_sets into fuzzy reaction sets, and find their intersection
using intersection_fn
Parameters:
gene_sets (iterable of iterable of str) – Sets of genes to find the fuzzy reaction set intersection for
metabolic_network (nx.Graph or nx.DiGraph) – Metabolic reaction network represented by a networkx Graph or DiGraph.
DiGraphs will be converted to Graphs before processing.
metabolic_model (cobra.Model) – Metabolic model from which the metabolic network was constructed
(used for translating reactions to genes)
intersection_fn ({"mean", "min", "max", "geom", "rra"} or Callable[[pd.DataFrame], pd.Series]) – Either a str specifying an intersection function (see notes), or
a Callable which takes a DataFrame, where each column is a fuzzy reaction
set and returns a Series which is a new fuzzy reaction set representing
the intersection of the input fuzzy reaction sets.
intersection_fn_kwargs (dict of str to Any) – kwargs passed to the intersection function
rank_method ({"average", "min", "max", "first", "dense"}) – If the intersection_fn is ‘rra’, how are ties in the
membership values handled when performing ranking
kwargs – Keyword arguments are passed to fuzzy_reaction_set
Returns:
intersection – A pandas Series representing a fuzzy reaction set constructed
by intersecting the fuzzy reaction sets derived from the
gene_sets.
Return type:
pd.Series
Notes
The possible methods for the intersection are:
mean: Take the arithmetic mean of the membership values
min: Take the minimum of the membership values
max: Take the max of the membership values
geom: Take the geometric mean of the membership values
rra: Perform robust rank aggregation on the membership values,
and the subtract the resulting rho-score from 1.0
metabolic_network (nx.Graph or nx.DiGraph) – Metabolic reaction network represented by a networkx Graph or DiGraph.
DiGraphs will be converted to Graphs before processing.
metabolic_model (cobra.Model) – Metabolic model from which the metabolic network was constructed
(used for translating reactions to genes)
gene_set (Iterable of str) – Set of genes to convert into a fuzzy reaction set
membership_fn (str or FuzzyMembershipFunction) – The membership function to use, can be a string giving the
functions name, or the function itself which must match the
signature of FuzzyMembershipFunction
scale (bool or float, optional) – Whether to scale the results of the membership values. If
False or None, no scaling will be applied. If True, will
be scaled to be between 0 and 1 using a min-max scaler.
If a float, the scaling will use a min-max scaler, but
treat scale as the max.
essential (bool) – Whether, when translating from reactions to genes, only
genes required for a reaction to function should be associated
with a particular reaction.
processes (int, optional) – Number of processes to use for parallel processing
kwargs – Additional keyword arguments are passed to the membership_fn
Returns:
reaction_set – The fuzzy reaction set, described by a pandas series. The index
is the reaction id, and the values are the set membership.
Return type:
pd.Series
Notes
The options for membership functions to be selected by
name (i.e. str arg to membership_fn) are
‘simple gene density’
‘simple reaction density’
‘weighted gene density’
‘weighted reaction density’
‘knn gene density’
‘knn reaction density’
‘gene enrichment’
The difference between the gene and reaction density functions, are
how multiple genes being associated with a single reaction are counted.
For the gene type, multiple genes will all count towards the membership,
whereas with the reaction type reactions are counted only once regardless
of how many genes associated with them are in the gene set.
Membership function which computes the membership based on how
many genes within distance radius are in the target gene set,
decreasing the weight of each gene as it moves farther from the
reaction, and still normalizing for the number of genes in each
layer (i.e. the number of genes which show up at a certain
distance)
Parameters:
reaction (cobra.Reaction) – The reaction to find the membership of
network (nx.Graph) – Connectivity graph of the network
gene_set (set of str) – The set of gene’s to translate into a reaction set
reaction_to_gene_dict (dict of str to set of str) – A dict for translating from reactions to sets of genes
associated with each reaction
max_radius (int) – The maximum distance used to include genes from
weight_fn (Callable taking int returning float) – The function used to compute the weight for genes
depending on their distance from the reaction. Should
take in the distance, and return a weight. For this
to act as a membership function, the sum of the
weights should be 1.0
allow_repeats (bool, default=False) – Whether to allow genes to be counted multiple times,
genes that have been seen before will be removed from the
gene neighborhood prior to calculating the membership
contribution for the layer
kwargs – Keyword arguments passed through to the weight function
Returns:
membership – The membership of the reaction in the reaction set
Membership function which computes the membership based on how
many genes within distance radius are in the target gene set
Parameters:
reaction (cobra.Reaction) – The reaction to find the membership of
network (nx.Graph) – Connectivity graph of the network
gene_set (set of str) – The set of gene’s to translate into a reaction set
reaction_to_gene_dict (dict of str to set of str) – A dict for translating from reactions to sets of genes
associated with each reaction
radius (int) – The distance used to define network neighborhoods
max_radius (int) – The maximum distance used to include genes from
weight_fn (Callable taking int returning float) – The function used to compute the weight for genes
depending on their distance from the reaction. Should
take in the distance, and return a weight.
kwargs – Keyword arguments passed through to the weight function
Returns:
membership – The membership of the reaction in the reaction set
Membership function which computes the membership by calculating the
enrichment of target set genes which are in a neighborhood defined
by the radius around the reaction. The membership will be
1-pvalue where pvalue is calculated using a Fisher’s exact test
to quantify the enrichment.
Parameters:
reaction (cobra.Reaction) – The reaction to find the membership of
network (nx.Graph) – Connectivity graph of the network
gene_set (set of str) – The set of gene’s to translate into a reaction set
reaction_to_gene_dict (dict of str to set of str) – A dict for translating from reactions to sets of genes
associated with each reaction
radius (int) – The distance used to define network neighborhoods
total_genes (int, optional) – The total number of genes associated with the reactions
in the network, if not provided will calculate this based
on the reaction_to_gene_dict.
Returns:
membership – The membership of the reaction in the reaction set, calculated
as 1-(p-value), where p-value is the enrichment p-value
Return type:
float
Notes
If used with the fuzzy_reaction_set method, the total genes will
automatically be calculated and passed in if not provided, so you
don’t need to do that manually (though it can still be over ridden if desired).
Membership function which computes the membership based on distance
to the kth neighbor.
Parameters:
reaction (cobra.Reaction) – The reaction to find the membership of
network (nx.Graph) – Connectivity graph of the network
gene_set (set of str) – The set of gene’s to translate into a reaction set
reaction_to_gene_dict (dict of str to set of str) – A dict for translating from reactions to sets of genes
associated with each reaction
max_radius (int) – The maximum radius to search for neighbors,
if less than k-neighbors are found in this radius
the membership will be 0.0
k_neighbors (int) – The number of neighbors in the gene_set to use for estimating
density in the graph
weight_fn (Callable(int)->float, default=weight_fn_reciprocal) – Function to use for calculating the membership associated
with a distance
allow_repeats (bool, default=False) – Whether to allow genes to be counted multiple times,
genes that have been seen before will not be counted
towards the number of neighbors
kwargs – Additional keyword arguments passed through to
the weight_fn
Returns:
membership – The membership of the reaction in the reaction set
Return type:
float
Notes
The \(knn-distance\) is the distance from the reaction to the
kth neighbor which is in the gene set, so that if k=1, and the reaction
is directly associated with a gene the knn-distance will be 0. If k=2,
the distance from the reaction to the second closest node associated
with a gene in gene_set will be used.
Membership function which computes the membership based on distance
to the kth neighbor.
Parameters:
reaction (cobra.Reaction) – The reaction to find the membership of
network (nx.Graph) – Connectivity graph of the network
gene_set (set of str) – The set of gene’s to translate into a reaction set
reaction_to_gene_dict (dict of str to set of str) – A dict for translating from reactions to sets of genes
associated with each reaction
max_radius (int) – The maximum radius to search for neighbors,
if less than k-neighbors are found in this radius
the membership will be 0.0
k_neighbors (int) – The number of neighbors in the gene_set to use for estimating
density in the graph
weight_fn (Callable(int)->float, default=weight_fn_reciprocal) – Function to use for calculating the membership associated
with a distance
dimension (int, default=2) – Dimension parameter, used for scaling the density, see notes
Larger values will result in smaller membership values.
diameter (int, optional) – The diameter of the network. If not provided it will be calculated from
network
kwargs – Additional keyword arguments passed through to
the weight_fn
Returns:
membership – The membership of the reaction in the reaction set
Return type:
float
Notes
The \(knn-distance\) is the distance from the reaction to the
kth neighbor reaction which is associated with a gene in the gene set,
so that if k=1, and the reaction is directly associated with a gene
the knn-distance will be 0. If k=2, the distance from the reaction
to the second closest node associated with a gene in gene_set will
be used.
Membership function which computes the membership based on how
many reactions withhin distance radius are associated with
a gene in the target gene set
Parameters:
reaction (cobra.Reaction) – The reaction to find the membership of
network (nx.Graph) – Connectivity graph of the network
gene_set (set of str) – The set of gene’s to translate into a reaction set
reaction_to_gene_dict (dict of str to set of str) – A dict for translating from reactions to sets of genes
associated with each reaction
radius (int) – The distance used to define network neighborhoods
Returns:
membership – The membership of the reaciton in the reaction set
network (nx.Graph) – The network whose neighborhoods will be identified
radius (int) – The radius determining the sizes of the neighborhoods
Returns:
neighborhoods – Dict describing the nodes in the graph, keyed by
node with values of sets of nodes in the neighborhood
of the node (including the node itself)
Create an adjacency matrix representing the metabolic network of a provided
cobra Model
Parameters:
model (cobra.Model) – Cobra Model to create the network from
weighted (bool) – Whether the network should be weighted
directed (bool) – Whether the network should be directed
weight_by ({'fva', 'pfba', 'stoichiometry'}, default='stoichiometry') – String indicating if the network should be weighted by
‘stoichiometry’, ‘fva’, ‘pfba’ (see notes for more information).
Ignored if weighted = False
threshold (float) – Threshold, below which to consider a (absolute value of a) bound/flux
to be 0
kwargs – Passed to cobra’s flux_variability_analysis function if the weight_by
is ‘fva’, or cobra’s pfba function if the weight_by is ‘pfba’
Returns:
The adjacency matrix
Return type:
pd.DataFrame
Notes
When creating a weighted network, the options are to weight the edges based
on flux, or stoichiometry. If stoichiometry is chosen the edge weight will
correspond to the stoichiometric coefficient of the metabolite, in a given
reaction.
For ‘fva’ weighting, first flux variability analysis is performed. The edge
weight is determined by the maximum flux through a reaction in a particular
direction (forward if the metabolite is a product of the reaction,
reverse if the metabolite is a substrate) multiplied by the metabolite
stoichiometry. If the network is unweighted, the maximum of the absolute
value of the forward and the reverse flux is used instead.
For ‘pfba’ weighting, first parsimonious flux analysis is performed. The
edge weight between a reaction and metabolite is determined by the
stoichiometric coefficient of the metabolite multiplied by flux of the
reaction in the pFBA solution.
Create a gene connectivity network from the metabolic model,
see notes for details
Parameters:
model (cobra.Model) – Cobra Model to create the network from
directed (bool) – Whether the network should be directed. It True,
the network’s edges direction will be decided by the
directionality of the reaction network, and
multiple genes associated with a single reaction
will have two (reciprocal) edges connecting them.
nodes_to_remove (list[str] or None) – List of any metabolites or reactions to remove
from the metabolic network prior to projecting
it onto the reactions and constructing the gene network.
Each metabolite/reaction to remove should be the string
id associated with them in the cobra Model
essential (bool) – Whether a gene should be required for a reaction to function
in order for that reaction to be used in assigning the
gene edges
Returns:
gene_network – Network connecting genes which are neighboring in the
reaction network together
Return type:
nx.Graph or nx.DiGraph
Notes
The gene network includes nodes for each gene associated with
a reaction in the network (whether or not essential is True).
Edges are added by connecting each gene associated with a reaction
to genes associated with all the neighboring reactions. If the
graph is directed, then gene nodes are connected to genes associated
with succcessor reactions. For genes associated with a single reaction
they are given edges between them (going both directions in the
case of directed graphs).
The essential parameter is to decide which genes are associated
with which reactions in order to determine which genes are neighbors
in the gene network. If True, genes will only be associated with
a reaction, when adding edges to the network, if they are required
for that reaction to function. All genes associated with reactions
in the network will still be added as nodes even if they are not
essential for any reactions in the network.
Create an adjacency matrix for the distances between the groups
Parameters:
network (nx.Graph or nx.DiGraph) – Network to use when finding distances between nodes
in the groups. Edge weights are ignored.
groups (: dict of Hashable to Iterable of Hashable) – Group definitions, must be a map between group names (which
will be used as index/columns in the matrix), and an iterable of
group members (which should be nodes in the network)
weight (str, optional) – Edge attribute to use for weight, if None all edges have weight 1
linkage ({'mean', 'min', 'max'}) – Method to use when combining pairwise distances between groups
directed (bool) – Whether the adjacency matrix should be directed or not, ignored
unless the input network is a nx.DiGraph
Returns:
adjacency_matrix – DataFrame representing the adjacency matrix of the distances
between the groups on the network. Index and columns
are the keys of the groups dict, with values representing the
distances between the groups.
Return type:
pd.DataFrame
Notes
Constructs the adjacency matrix using the pairwise distances between
groups. For each pair of groups, finds the distances between their
nodes and finds the distance between the two groups by aggregating
these distances, either using the mean, minimum, or maximum of
the set of pairwise distances between two groups of nodes.
Create an network for the distances between the groups
Parameters:
network (nx.Graph or nx.DiGraph) – Network to use when finding distances between nodes
in the groups. Edge weights are ignored.
groups (: dict of Hashable to Iterable of Hashable) – Group definitions, must be a map between group names (which
will be used as index/columns in the matrix), and an iterable of
group members (which should be nodes in the network)
weight (str, optional) – Edge attribute to use for weight, if None all edges have weight 1
linkage ({'mean', 'min', 'max'}) – Method to use when combining pairwise distances between groups
directed (bool) – Whether the adjacency matrix should be directed or not, ignored
unless the input network is a nx.DiGraph
Returns:
Network with a node for each group, and edges weighted by the distances
between the groups on the network.
Return type:
nx.Graph or nx.DiGraph
Notes
Constructs the network using the pairwise distances between
groups. For each pair of groups, finds the distances between their
nodes and finds the distance between the two groups by aggregating
these distances, either using the mean, minimum, or maximum of
the set of pairwise distances between two groups of nodes.
Create a group connectivity network, see notes for details
Parameters:
network (nx.Graph or nx.DiGraph) – Network to use when finding neighbors. Edge weights
will be ignored.
groups (dict of Hashable to Iterable of Hashable) – Group definitions, must be a map between group names (which
will be used as nodes in the network), and an iterable of
group members (which should be nodes in the network)
max_distance (int, default=1) – Max distance for nodes to be considered neighbors. A value of 0
will only connect groups with direct overlaps, while a value of 1
will connect groups which have members that are direct neighbors in the
network.
weighted ({'count', 'proportion', 'enrichment'}, optional) – Whether to weight the graph based on the number of connections
between the groups. If None (default) no weights are added. If
‘count’ then the edge weight is the count of connections between
the two groups. If ‘proportion’, the edge weight is normalized
by the maximum possible overlap. If enrichment, node attributes are
added called pvalue, odds_ratio, and significance. The pvalue and
odds ratio are the results of performing a Fisher’s exact test on
the enrichment of one group in the neighborhood of the other (in the
undirected case, it is the minimum p-value/maximum odds_ratio found
when finding the enrichment of one group in the neighborhood of the
other). The significance is the -log10 of the p-value. Note that the
odds_ratio can be infinite.
directed (bool, default=False) – Whether the resulting connectivity graph should be directed,
ignored unless the input network is directed.
Returns:
group_neighborhood_network – The group connectivity graph, which includes nodes for every group
defined in group, with edges connecting groups which are connected
in network, with optional edge weighted. Will be nx.Graph unless
the input network is a DiGraph, and directed is True.
Return type:
nx.Graph or nx.DiGraph
Notes
The group connectivity graph is a graph with a node for each group
in groups, and edges connecting groups which include neighbors
on the network.
For example, take a graph with:
Nodes: {a, b, c, d, e, f, g}
Edges: {(a, b), (c,d), (e,f), (a,g)}
then the group connectivity graph for groups
{group1: {a,c}, group2:{d,e}, group3:{b,f}, group4:{g}}
will produce the group connectivity graph (with parameter
max_distance set to 1):
When counting the number of connections, it is determined
by finding the total neighborhood of one of the groups
(that is the total node set within radius of a node
in that group), and counting the number of nodes from
the other group which are within that neighborhood.
model (cobra.Model) – Cobra Model to create the network from
weighted (bool) – Whether the network should be weighted
directed (bool) – Whether the network should be directed
weight_by ({'fva', 'pfba', 'stoichiometry'}, default='stoichiometry') – String indicating if the network should be weighted by
‘stoichiometry’, ‘fva’, ‘pfba’ (see notes for more information).
Ignored if weighted = False
nodes_to_remove (list[str] | None) – List of any metabolites or reactions that should be removed from
the final network. This can be used to remove metabolites that
participate in a large number of reactions, but are not desired
in downstream analysis such as water, or ATP, or pseudo
reactions like biomass. Each metabolite/reaction should be the
string ID associated with them in the cobra model.
reciprocal_weights (bool) – Whether to use the reciprocal of the weights, useful if higher
flux should equate with lower weights in the final network (for
use with graph algorithms)
threshold (float) – Threshold, below which to consider a bound to be 0
kwargs – Keyword arguments are passed to the cobra flux_variability_analysis method
when weight_by is flux
Returns:
A network representing the metabolic network from the provided
cobrapy model
Return type:
nx.Graph | nx.DiGraph
Notes
When creating a weighted network, the options are to weight the edges based
on flux, or stoichiometry. If stoichiometry is chosen the edge weight will
correspond to the stoichiometric coefficient of the metabolite, in a given
reaction.
For ‘fva’ weighting, first flux variability analysis is performed. The edge
weight is determined by the maximum flux through a reaction in a particular
direction (forward if the metabolite is a product of the reaction,
reverse if the metabolite is a substrate) multiplied by the metabolite
stoichiometry. If the network is unweighted, the maximum of the absolute
value of the forward and the reverse flux is used instead.
For ‘pfba’ weighting, first parsimonious flux analysis is performed. The
edge weight between a reaction and metabolite is determined by the
stoichiometric coefficient of the metabolite multiplied by flux of the
reaction in the pFBA solution.
Create a metabolite connectivity network from the
metabolic model
Parameters:
model (cobra.Model) – Cobra Model to create the network from
weighted (bool) – Whether the network should be weighted
directed (bool) – Whether the network should be directed
weight_by ({'fva', 'pfba', 'stoichiometry'}, default='stoichiometry') – String indicating if the network should be weighted by
‘stoichiometry’, ‘fva’, ‘pfba’ (see notes for more information).
Ignored if weighted = False
nodes_to_remove (list[str] | None) – List of any metabolites or reactions that should be removed from
the final network. This can be used to remove metabolites that
participate in a large number of reactions, but are not desired
in downstream analysis such as water, or ATP, or pseudo
reactions like biomass. Each metabolite/reaction should be the
string ID associated with them in the cobra model.
reciprocal_weights (bool) – Whether to use the reciprocal of the weights, useful if higher
flux should equate with lower weights in the final network (for
use with graph algorithms)
threshold (float) – Threshold, below which to consider a bound to be 0
projection_weight (str | Callable[[float, float], float] | None) – How to weight the projected graph. If None, the projected graph
will not be weighted. If “ratio”, the edges will be weighted
based on the ratio between actual shared neighbors and maximum
possible shared neighbors. If “count”, the edges will be
weighted by the number of shared neighbors. A function can also
be provided, which takes two float arguments (the weights of two
edges), and returns a float.
projection_weight_combine (Callable[[list[float]], float], optional) – How to combine multiple projected edges. If two nodes in the set
being projected onto, share multiple neighbors in the other node set,
they can have multiple possible edge weights. This function takes in
a list of possible weights, and returns a single final weight. Python
builtin max and min can be used for this. If not provided,
max is used.
kwargs – Keyword arguments are passed to the cobra flux_variability_analysis method
when weight_by is flux
Create a mutual information network from the provided metabolic model
Parameters:
model (Optional[cobra.Model]) – Metabolic model to construct the mutual information network
from. Only required if the flux_samples parameter is None
flux_samples (Optional[pd.DataFrame|np.ndarray]) – Flux samples used to calculate mutual information between
reactions. If None, the passed model will be sampled to generate
these flux samples.
reaction_names (Optional[Iterable[str]]) – Names for the reactions
cutoff_significance (float, optional) – Upper bound for the significance of the mutual information,
any mutual information values with p-values above this
cutoff will have their mutual information set to 0.
Will calculate this p-value using permutation testing,
see mi_pairwise for more information.
n_samples (int) – Number of samples to take if flux_samples is None (ignored if
flux_samples is not None)
reciprocal_weights (bool) – Whether the non-zero weights in the network should be the
reciprocal of mutual information.
processes (int) – Number of processes to use during the flux sampling and
mutual information calculation
kwargs – Keyword arguments passed to the mi_pairwise function
Returns:
A networkx Graph, which nodes representing different reactions
and edge weights corresponding to estimated mutual information
Create a reaction connectivity network from the
metabolic model
Parameters:
model (cobra.Model) – Cobra Model to create the network from
weighted (bool) – Whether the network should be weighted
directed (bool) – Whether the network should be directed
weight_by ({'fva', 'pfba', 'stoichiometry'}, default='stoichiometry') – String indicating if the network should be weighted by
‘stoichiometry’, ‘fva’, ‘pfba’ (see notes for more information).
Ignored if weighted = False
nodes_to_remove (list[str] | None) – List of any metabolites or reactions that should be removed from
the final network. This can be used to remove metabolites that
participate in a large number of reactions, but are not desired
in downstream analysis such as water, or ATP, or pseudo
reactions like biomass. Each metabolite/reaction should be the
string ID associated with them in the cobra model.
reciprocal_weights (bool) – Whether to use the reciprocal of the weights, useful if higher
flux should equate with lower weights in the final network (for
use with graph algorithms)
threshold (float) – Threshold, below which to consider a bound to be 0
projection_weight (str | Callable[[float, float], float] | None) – How to weight the projected graph. If None, the projected graph
will not be weighted. If “ratio”, the edges will be weighted
based on the ratio between actual shared neighbors and maximum
possible shared neighbors. If “count”, the edges will be
weighted by the number of shared neighbors. A function can also
be provided, which takes two float arguments (the weights of two
edges), and returns a float.
projection_weight_combine (Callable[[list[float]], float], optional) – How to combine multiple projected edges. If two nodes in the set
being projected onto, share multiple neighbors in the other node set,
they can have multiple possible edge weights. This function takes in
a list of possible weights, and returns a single final weight. Python
builtin max and min can be used for this. If not provided,
max is used.
kwargs – Keyword arguments are passed to the cobra flux_variability_analysis method
when weight_by is flux
Function to project a bipartite graph onto the specified set of nodes
Parameters:
network (nx.Graph | nx.DiGraph) – Network to project
node_set (Iterable) – Nodes to project the graph onto
directed (bool | None) – Whether the projected graph should be directed. If the network
argument is not directed this is ignored. A value of None will
have the directedness of the output match the directedness of
the input network.
weight (str | Callable[[float, float], float], optional) – How to weight the projected graph. If None, the projected graph
will not be weighted. If “ratio”, the edges will be weighted
based on the ratio between actual shared neighbors and maximum
possible shared neighbors. If “count”, the edges will be
weighted by the number of shared neighbors. A function can also
be provided, which takes two float arguments (the weights of two
edges), and returns a float.
weight_combine (Callable[[list[float]], float], optional) – How to combine multiple projected edges. If two nodes in the set
being projected onto, share multiple neighbors in the other node set,
they can have multiple possible edge weights. This function takes in
a list of possible weights, and returns a single final weight. Python
builtin max and min can be used for this. If not provided,
max is used.
weight_attribute (str) – Which edge attribute in the original network to use for
weighting. Default is ‘weight’.
reciprocal (bool, default=False) – If converting from a directed graph to an undirected one,
whether to only keep edges that appear in both directions in the
original directed network.