Duplication growth model for gene expression networks
There has been considerable recent interest in the net-
work structure of a diverse range of systems, including
the Internet, communities of actors, scholarly citations,
metabolic networks and ecological systems, among oth-
ers (Jeong et al., 2000; Baraba´si and Albert, 1999; Al-
bert and Baraba´si, 2001; Strogatz, 2001; Amaral et al.,
2000). Three main categories of networks have been used
to model these various systems. They are: random net-
works (Cohen, 1988; Kauffman, 1969), small world net-
works (Strogatz, 2001; Watts, 1999) and growing random
networks (GRNs; Jeong et al., 2000; Baraba´si and Albert,
1999; Krapivsky et al., 2000; Dorogovtsev and Mendes,
2001). Random graphs have been extensively studied and
are constructed by randomly connecting a set of nodes.
Small world graphs are generated from a regular starting
lattice. Edges in this lattice are then randomly ‘rewired’
to remote nodes. This provides strong local structure as
well as global connectivity. Graphs can also be constructed
from non-equilibrium growth models that start with a seed
graph and add nodes and connections according to some
prescribed set of preferences. Often a ‘rich get richer’ set
of preferences are used, where the newly added nodes are
preferentially connected to nodes of high connectivity.
Often the choice of model is dictated by the specific
graphical property under investigation. For instance, small
world models were originally motivated by the observa-
tion of networks that have high clustering coefficients and
short mean path lengths. The cluster coefficient character-
izes the extent to which vertices adjacent to any vertex are
adjacent to each other. In social networks it is the degree
that a persons acquaintances are acquainted to each other.
The cluster coefficient is calculated by averaging over all
vertices, the fraction of vertices adjacent to a given ver-
tex that are adjacent to each other. The cluster coefficient
varies from 0 to 1 with 1 indicating that all the neighbor-
ing nodes are connected to one another. The characteristic
path length is found by determining the number of edges
on the shortest path connecting any two vertices and aver-
aging this number over all pairs of vertices. GRNs, on the
other hand, were developed to explain the scale-free distri-
bution of node connectivities, a property that the original
small world models do not have. Scale-free distributions
have no characteristic length scale and follow power law
behavior. Random graphs show an interesting phase tran-
sition in the ‘connectedness’ of the graph, but do not show
small world or scale-free behavior. When matching a given
model to a natural network phenomenon, it is important to
examine a range of graphical parameters for full discrimi-
nation between potential models.
connectivity scaling exponent. As will be seen, no existing
model can account for the combined properties of the gene
expression networks. To explain these results, we propose
a new network growth model based on gene duplication
events. Computer simulations indicate that this model can
adequately describe the gene expression graph parameters.
METHODS AND IMPLEMENTATION
Networks from dynamic models of gene expression
There have been a number of recent attempts to analyze
time series data for whole genome expression profiles
(Dewey and Galas, 2001; Holter et al., 2001; Heyer et
al., 1999; DeRisi et al., 1997; Spellman et al., 1998)).
Interestingly, this does not require the complexity of
detailed non-linear models of gene expression, but needs
only simple, linear models (Dewey and Galas, 2001;
Holter et al., 2001)). These previous studies have focused
on the cell-cycle and diauxic shift data in the yeast Sac-
charomyces cerevisiae (DeRisi et al., 1997; Spellman et
al., 1998). In both cases, the system is prepared in a given
physiological state at the initial time point and changes
in gene expression levels are measured as it moves to
a new state. These experiments have some similarity
to traditional perturbation–relaxation experiments in
physics and chemistry. Given this analogy, it is perhaps
not surprising that the time dependence of the expression
profiles can be well represented by simple linear response
models.
Our previous analysis of expression time series is based
on a simple dynamical model (Dewey and Galas, 2001)
that includes both linear and non-linear kinetic terms. This
model is briefly summarized here. The time dependence of
the system is represented by the rate law given below:
A(t) = ꢀ1 A(t − 1) + A(t − 1)AT(t − 1)ꢀ2 (1)
where A(t) is a matrix of the gene expression profiles at
different points in time. A(t) = (aˆ(2), . . . , aˆ(t)) where
aˆ(t) is a vector representing the expression levels of all
genes in the genome at time, t = i. The ratio values from
the public domain data sets were used (DeRisi et al., 1997;
Spellman et al., 1998), rather than the log ratios, as these
values are proportional to the mRNA concentration and
are consistent with a first-order chemical kinetic model.
The matrix A(t − 1) is a time-lagged matrix given by:
A(t − 1) = (aˆ(1), . . . , aˆ(t − 1)). The first term in
Equation 1 represents a simple linear response and the
elements of the ꢀ1 matrix λi j , give the influence of the
expression level of the jth gene on the production of the ith
gene. The second term, A(t − 1)AT(t − 1), in Equation 1,
the gene covariance at a previous time, introduces non-
linearity into the model.
In this work, we show how network models of gene
expression can be obtained from a dynamic model of
whole genome expression. The resulting networks are
analyzed by determining three global graph properties—
the average path length, the clustering coefficient and the
The two matrices ꢀ1 and ꢀ2 generated by this data
analysis are components of the weighted connectivity
1487