(This is a chapter from Complex networks, complex processes.)

**continuous** model where the population sizes are assumed to be real numbers. This makes a certain amount of sense if we think of compartments as fractions of an overall population. However, from another perspective, it's clear that there's another perspective in which only whole numbers of people become sick, leading to a **discrete** model that places an integer number of individuals into each compartment. How do we reconcile these two views?

The continuous model is best thought of as modelling the large-scale, **macroscopic** behaviour of the epidemic in which the don't really care about the exact numbers of individuals concerned. Also, for a large population, considering the relative sizes of compartments to a few decimal palces of accuracy will still yield something close to a whole number of individuals per compartment when the compartment fractions are scaled-up to the size of the overall population.

But we can also ask what happens at the **macroscopic** scale, for individuals. In that case we want to know how the disease might evolve in a *single person*. Another way to think of this is that a comparttmented model allows each individual person to traverse the compartments according to the probabilities associated with each transition.

Clearly the macroscopic and microscopic descriptions are related: we assume that, if we let a disease run through a population, then the ways in which individuals' infections evolve will integrate to reflect the macroscopic description in terms of fractions of the entire population.

As well as being continuous, however, there's another assumption implicit in the contionuous description. Let's re-visit the equations describing SIR:

$$ \frac{ds}{dt} = -\beta s(t) i(t) \hspace{1in} \frac{di}{dt} = \beta s(t) i(t) - \alpha i(t) \hspace{1in} \frac{dr}{dt} = \alpha i(t) $$Here $i(t)$ denotes the fraction of the population who are infected as time $t$. The rate of change in this population, $\frac{di}{dt}$, has two terms: a growth term $\beta s(t) i(t)$, and a reducing term $\alpha i(t)$. The growth term says that the infected population grows at a rate that is proportional to the total number of (susceptible, infected) pairs in the population, which is simply the product of the two population sizes: in each unit of time, all these people meet each other and a fraction $\beta$ of the susceptibles become infected.

The assumption, clearly, is that all these pairs of people *do actually meet*, and this is a strong assumption. It's called the assumption of **well-mixing**, or alternatively of a **homogeneous** population. We discussed this earlier when we talked about attack rates and reproduction numbers. In "small" populations, well-mixing isn't a totally unreasonable assumption – although it *is* still an approximation of reality (even the people in my small village don't all meet each other every day). If we were to consider a population the size of Scotland, it's clearly implausible.

That doesn't mean we should throw the model away. The statistician George Box is quoted as saying, "*There is no need to ask the question 'Is the model true?'. If 'truth' is to be the 'whole truth' the answer must be 'No'. The only question of interest is 'Is the model illuminating and useful?'*" But the simplification of SIR to three differential equations does smear-out some structure that might be important – and, it turns out, *is* important in the sense that there are disease phenomena that occur in nature that don't occur in this system. Putting SIR onto a network is one way of addressing this.

So in moving to diseases on networks we're trying to address two issues:

- that populations exhibit structure and so are not well-mixed; and
- that diseases occur in individuals, not simply in populations.

To address the first issue, we use a network to represent individuals and their interactions, with the connection structure of the network providing the opportunity for different kinds of inhomogeneity.For the second issue, we develop a discrete description of SIR, consistent with the continuous version, that we can apply to the individual noides of the network. We can then study how different network structures affect the properties of an epidemic.

The first step is conceptually the easier, but has some subtleties. The natural way to treat a population as a network is to have one node per individual in the population. Edges between nodes represent social interactions that are opportunities for infection. If a susceptible person is connected by an edge to an infected person, then there is an opportunity for the latter person to infect the former. Conversely, if there is no such edge, then the susceptible person cannot be infected that the infected individual, since there exists no social contact between them.

How might we construct this network? The simplest approach is undoubtedly to create a random network of some kind: perhaps an ER network, in which case we will obtain a "social network" for $N$ individuals who interact in a random way with a well-defined mean number of others. Simulating an epidemic will then involve running our toi-be-designed discrete disease process over this network, and examining the results.

A moment's thought will show several problems with this approach. Firstly, not all contacts are created equal, as we saw when we discussed secondary attack rates: people in close contact (such as children in a nursery, or people in a care home) are more likely to infect one another than people in weaker contact (such as workers in a factory). We could address this issue, perhaps, by **weighting** the edges between people to capture that fact that "some edges are more infectious than others". Alternatively, we might argue that these factors will even-out over a suiitably heterogeneous population, and so if we focus on the probability of infection for an "average social contact" we can still extract meaningfuil information from any simulation.

Secondly, how are individuals to be connected in the infection network? Are their connections random? Do they exhibit a more clustered structure? Are there dense packets of highly connected individuals, separated by sparse connections? These are questions of network degree, connectivity, and so forth – of network topology in general – and intuitively it seems clear that the choice may make a difference. We might, for example, expect a disease in a well-connected, high-mean-degree network to spread differently to the same disease on a network with lower connectivity.

Thirdly, we have described a **static** network whose connections don't change over time. Relating this back to the context we're considering, that doesn't seem appropriate. People might be expected to avoid individuals who are sick, or the sick individuals might be quarantined to preclude social contact. Either of these behaviours would be expected to remove social contacts – edges – from the part of the network around an infected individual.

(When I was growing up in England in the 1970s, parents actually demonstrated exactly the opposite behaviour. If a child got measles, for example, mothers all brought their children round for a play date with the explicit intention of getting them infected too – the logic being that exposing a child to the disease early was good for their immune systems, got the one-off infection "out of the way", and generally improved herd immunity. None of those arguments are at all wrong, but this approach to parenting seems to have gone out of fashion.)

In either case, we might think that it is more appropriate to adopt an approach that changes the structure of the network in response to infection, perhaps reducing the number of edges when a node is infected. In this case we have a **dynamic** or **adaptive** network structure, where the network responds to the progress of the process running over it. Again, we might decide that these effects will even-out and can be ignored to give an "average" result.

The upshot of this discussion is that we can take a simple representation – a static, random network with unweighted links – and then add more features if we think they might be relevant. As we do so we make the model more realistic – but also more complicated, and and we add to the number of possible degrees of freedom.

Adding more factors in pursuit of realism may sound attractive, but we have to bear in mind that it also gives us a freedom we may not be able to use effectively. Consider the case where we reduce the number of edges to an infected individual. How many edges do we remove?, and how do we select them? – and will these choices make a critical difference?, and how do they interact with the existying parameters of the model? In adding a new freedom we also add a considerable burden of analysis and simulation to check what effects our new freedom has. Might it be better to stick with the simplest case?

This argument might sound bogus to you: a cop-out just to reduce the amount of work we have to do. And if your primary interest is in the dynamics of a *particular* disease, about which you want to make accurate predictions – as would be the case for planning a clinical response to an outbreak – then of course you may strive to build *the most realistic model possible* and accept the associated extra work. On the other hand, if your primary interest is epidemic processes in general, you might be happy to stick to simpler models to see whether they *always* exhibit certain features which can then be generalised (with care) to *all* diseases. We'll see an example of this later in the case of epiudemic thresholds, where certain combinations of infectiousness and recovery *necessarily* lead to epidemics pretty much regardless of everything else.

Now let's return to the second issue we identified above: moving from a continuous to a discrete description of the disease process.

Compartmented models of disease represent diseases as a collection of compartments. We notionally consider each individual in the population to be "in" a particular compartment at a given time. As their disease progresses, they move "from" one compartment "to" another, typically according to some stochastic process where their re-location happens with some probability. In addition, this probability may be affected by other factors, for example the presence of individuals in other compartments as neighbours. When looking at the overall disease behavuour (the macroscopic view) we are typically interested in how the relative sizes of the compartments changes. When looking at the disease's progress (the microscopic view) we additionally need to know about the compartments of neighbouring individuals. It is precisely this microscopic behaviour that is missing from the continuous-process description of compartmented models.

How then do we describe interactions at the scale of individual nodes?

Let's look again (not for the last time) at the differential equations for SIR:

$$ \frac{ds}{dt} = -\beta s(t) i(t) \hspace{1in} \frac{di}{dt} = \beta s(t) i(t) - \alpha i(t) \hspace{1in} \frac{dr}{dt} = \alpha i(t) $$There are three compartments, and the three equations (one per compartment) tell us how their population changes. Looking at the last equation, we see that $r(t)$ increases at a rate proportional to the $i(t)$, the size of the infected compartment. Similarly, looking at the first equation, $s(t)$ decreases at a rate proportional to the number of susceptibl;e-infected pairs. In the second equation, these two effects both appear inverted – understandably, since individuals pass through infection to recovery, and rates have to balance out if we are to keep the population constant.

So much for the compartments: what does this mean for an individual?

We know that we are representing the interactions between individuals as network edges. Suppose that at some time we have a given susceptible individual. That individual cannot become infected spontaneously, but only through interaction with an individual who is infected at the same time and with whom she has some social contact, represented by an edge. So to determine whether the susceptible individual is infected, we need to know whether she has any edges that lead to individuals who are infected. We refer to such edges as **SI edges**: they connect a susceptible node to an infected node.

Suppose we have found an SI edge linking our susceptible node to an infected neighbour. The infection "passes along" this edge with a probability $\beta$, turning our susceptible node into an infected node, decreasing the population of the susceptible compartment by one and increasing the population in the infected compartment by one.

But there is also another effect. The edge down which the infection travelled is no longer an SI edge, since it now connects an infected node to *another* infected node. Furthermore any other SI edges that connected our formerly-susceptible node to infected nodes are also no longer SI edges. And finally, the fact that our formerly-susceptible node is now an infected node means that there may be new SI edges created, where there are edges between our node and a neighbnouring susceptible node.

This is quite a bit more complicated than the equations suggest at first glance. It is perhaps simpler to think of it slightly differently. It is the population of SI edges, *not* the population of susceptible or infected individuals alone, which determines the rate of infection: that much is clear from the infection term. The infection dynamics happens, not at individual nodes, but at SI edges. If can think of the SI edges as a **locus** for the infection dynamics: a place at which infection possibly occurs. The edges in that locus are potentially changed by every infection **event**: every time an SI edge actually results in an infection.

*once it has happened* it has an impact on the SI edges – and therefore, indirectly, in future infection events. The locus for removal events is therefore the population of nodes in the infected compartment, any of which may spontaneously be removed.

Summing-up the above, we can now formulate a discrete description of SIR.

The model consists of three compartments: susceptible (S), infected (I), and removed (R). Each node resides in exactly one compartment at any time. There are two loci for the dynamics: SI edges, and infected nodes. There are two events: infection happens at the SI locus with probability $\beta$, while removal happens at the I locus with probability $\alpha$. The infection event moves the S node into the I compartment; the removal event moves the I node into the R compartment. Removal therefore affects the contents of the I locus, and both events may affect the contents of the SI locus. If we compare this description to the three equations above it is hopefully easy to see the derivation.

What we've done is quite significant, though. We've moved from a description consisting of three continuous rates of change (the three differential equations) to a description consisting of two discrete events, each happening at a different locus. The events can be applied to individual nodes or edges in our network model, in which we would need to track exactly which nodes are in which compartments, and which edges are in the SI locus we're interested in. It's worth noting that we really don't care about removed nodes: they don't appear in either locus, and therefore can't affect the dynamics, other than by the fact that nodes that are removed are by definition *not* susceptible or infected.

The process description is an essential step along the way to simulation, but we're not quite there yet. We need to be able to express the above model in a computational form suitable to be executed. We need to be able to keep track of the populations in the different loci of the dynamics. And we need to choose where, and at what times, the different events occur.

(This is a chapter from Complex networks, complex processes.)

When we created ER networks earlier, we started with an empty network of $N$ and then added edges between pairs of nodes with a given probability $\phi$. We know that this will eventually lead to a network with mean degree $N\phi$. But let's look at the process from a slightly different perspective: what happens *as we add the nodes*? Specifically, how do the nodes become connected as we add edges?

Intuitively we can argue as follows. We start with an empty network. Adding an edge necessarily build a 2-node component. Adding another edge is (for a larrge network, anyway) overwhelmingly like to pick two other nodes not in the first component, forming a second. We can continue like this for some time, but gradually it will become more likely that one of the nodes we choose to connect is not isolated by rather part of a larger cluster: indeed, *both* nodes may be part *different* clusters, which thereby become joined into a a single one. As we continue to add edges, it starts to become increasingly likely that the edges will placed be between increasingly large components, thereby connecting them. And as a component becomes larger, there are more ways to connect to it (since there are more nodes to choose as endpoints), so we might expect that large components grow at the expense of small components. Eventually the network may become one large component, but even before this we might expect that there will be one or more components that are large relative to the others and to the size of the network as a whole.

This is indeed what happens. As we add edges to the initially-empty network according to the ER process, we create a large number of small components that over time connect to each other. Because large components are easier to connect to they grow faster, which leads to the formation of a component that contains a large fraction of the nodes: the **giant component**.

Does the giant component necessarily form? A moment's thought will suggest not: if we only add a small number of edges, then clearly there won't be enough for a giant component to form.

Let's denote the size of the largest component in a network by $N_G$. How does $N_G$ vary as we add edges?

Starting from an empty network, we have $N_G = 1$ since every node is its own small cluster. The ratio of the size of the "giant" component to the size of the network, $\frac{N_G}{N} \rightarrow 0$ as $N \rightarrow \infty$: the giant component is an insignificant fraction of the nodes. As we add nodes, we expect $N_G$ to increase. If we were to set $\phi = 1$ and add *all* possible edges, then at the end of the process we would have $\frac{N_G}{N} = 1$, the giant component containing all the nodes. We can think of $\frac{N_G}{N}$ as the probability that a node chosen at random will be in the giant component. Let's refer to this probability as $S$.

How does a node $i$ end up outside the giant component? It means that, for every other node $j$ in the network,

- either $i$ is not connected to $j$; or
- $i$ i is connected to $j$ but $j$ is itself not in the giant component.

For a particular node $j$, the probability of the first case is $(1 - \phi)$ (since the probability of their being an edge added is $\phi$); the probability of the second case is $\phi (1 - S)$, there being an edge between $i$ and $j$ (which is $\phi$) *and* $j$ not being in the giant component (which is $(1 - S)$). If we sum-up this probability for every $j$, then the probability we are looking for is given by the recurrence equation $1 - S = ((1 - \phi) + \phi (1 - S))^{N - 1}$. If we re-arrange this slightly,

where we used $\phi = \frac{\langle k \rangle}{N}$. Taking logs on both sides,

\begin{align*} \ln (1 - S) &= N \, \ln (1 - \frac{\langle k \rangle}{N} S) \\ &= -N \frac{\langle k \rangle}{N} S) \\ &= - \langle k \rangle S \\ \end{align*}Then we can take exponentials on each side, leading to:

\begin{align*} 1 - S &= e^{- \langle k \rangle S} \\ S &= 1 - e^{- \langle k \rangle S} \end{align*}This is still an awkward recurrence equation: $S$ appears on both sides. Situations like this often have no closed-form solution, but there's a trick to make progress, which is to make use of a graphical method.

In [1]:

```
import networkx
import math
import numpy
import matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
import seaborn
```

In [2]:

```
fig = plt.figure(figsize = (5, 5))
# create a set of points for S, evenly spaced over the interval [0.0, 1.0]
ss = numpy.linspace(0.0, 1.0)
# different kmeans and their associated line types
kmeans = [ 0.5, 1, 1.5, 2 ]
lines = [ 'r-', 'g-', 'b-', 'y-' ]
# Build a function parameterised by kmean ro tun over S
def make_S( kmean ):
return (lambda S: 1.0 - math.exp(-kmean * S))
# plot S against S
plt.plot(ss, ss, 'k--')
# plot the exponential curves for the different selected kmeans
for i in range(len(kmeans)):
kmean = kmeans[i]
line = lines[i]
# map the appropriate function across S
ys = map(make_S(kmean), ss)
# plot the curve
plt.plot(ss, ys, line, label = '$\\langle k \\rangle = {k}$'.format(k = kmean))
plt.xlabel('$S$')
plt.title('Solutions for $S = 1 - e^{-\\langle k \\rangle S}$ for different values of $\\langle k \\rangle$')
plt.legend(loc = 'upper left')
_ = plt.show()
```

So by inspection for $\langle k \rangle = 1.5$ there is a solution at approximately $S = 0.58$, while for $\langle k \rangle = 2$ there is a solution at approximately $S = 0.8$ – 80% of the nodes in the network are in the giant component.

Looking at the lines for the different values of $\langle k \rangle$, notice that as $\langle k \rangle$ increases the corresponding curve starts out steeper. Shallow curver never intersect $y = S$, meaning no giant component emerges; as the curves get steeper, a solution emerges starting at low values of $S$ and gradually moving towards $S = 1$. The separator between these two regimes occurs when the initial gradient of the curve matches that of $y = S$, when the curve and the line are tangent to each other at $S = 0$. This separator is referred to as a **critical transition** or a **critical threshold**, because it's the critical value at which behaviour abruptly changes. It happens when:

and so:

$$ \langle k \rangle e^{-\langle k \rangle S} = 1 $$At $S = 0$ we discover that the critical threshold $\langle k \rangle_c = 1$.

We can of course also relate $\langle k \rangle_c$ back to $\phi$, the probability of adding an edge, and discover that the critical threshold probability $\phi_c$ below which the giant component doesn't form, but above which it does (a point we explore a little more below). For $\langle k \rangle_c = 1$ we have that $\phi_c = \frac{1}{N}$.

Let these two results sink in for a minute. Firstly, a mean degree of 1 – every node attached to on average one neighbnour – is enough to start forming a giant component and therefore, by implication, to take the network towards being connected. Secondly, for a large ER network even a vanishingly small number of edges will result in the formation of a giant component – and that number gets smaller as the network gets bigger! This all suggests that giant components will be common, so a lot of the networks we encounter in applications will have one.

Alternatively we can observe that, while it's hard to find $S$ in terms of $\langle k \rangle$, it is easy to find $\langle k \rangle$ in terms of $S$:

\begin{align*} S &= 1 - e^{-\langle k \rangle S} \\ 1 - S &= e^{-\langle k \rangle S} \\ \ln (1 - S) &= -\langle k \rangle S \\ \langle k \rangle &= - \frac{\ln (1 - S)}{S} \end{align*}Since we're actually interested in $S$ we can plot the curve rotated by ninety degrees for clarity, which yields:

In [3]:

```
fig = plt.figure(figsize = (5, 5))
ss = numpy.linspace(0.0, 1.0, endpoint = False) # omit 0.0 to avoid a divide-by-zero error later
plt.xlim([0, 4])
plt.xlabel("$\\langle k \\rangle$")
plt.ylabel("$S$")
plt.plot(map((lambda S: - math.log(1.0 - S) / S), ss), ss, 'r-')
plt.title('Expected size of giant component')
_ = plt.show()
```

This makes the critical nature of $\langle k \rangle_c = 1$ even more clear. As $\langle k \rangle$ grows beyond $\langle k \rangle_c$, the expected size of the giant componentÂ rapidly approaches the size of the network itself.

The existence and value of the critical threshold was first proven by Erdős and Rényi [ER59] in a paper that really marks the very start of network science. It shows that, even for small mean degrees, an ER network will have a giant component, and as the mean degree gets larger, that component will span the entire network. Looking at the graph above, you can see that the curve asymptotically approaches $S = 1$ as $\langle k \rangle \rightarrow \infty$. It is never *certain* that the process will connect the network – it's stochastic, after all – but it rapidly becomes overwhelmingly likely.

So much for the mathematics: let's look at the emergence of the giant component computationally.

The `networkx`

function `number_connected_components()`

computes the number of components in a network. To look at the giant component forming, we therefore need to count the number of components over the region around the critical threshold. We expect to see the number of components rapidly drop towards 1, and the fraction of nodes in the largest component rapidly increase towards 1.

We could therefore create an empty network and progressively add edges to it, counting the number of components as we go. We already have the code for this in our earlier from-scratch ER network generator: however, looking at the code, while the *result* is a random network, the *process* by which edges are added is actually very regular, and we should probably avoid such unnecessary regularity in case it makes a difference. One could easily imagine that adding nodes in a regular fashion might generate components faster (or slower?) than truly random addition.

What we could do instead is to build a random network and then re-construct it by emptying it and then adding the same edges edges in a random order. This destroys any artefacts coming from the way in which we added the edges in the first place.

We first define an iterator that will randomise a list:

In [4]:

```
from copy import copy
class permuted:
"""An iterator for the elements of an array in a random order."""
def __init__( self, es ):
"""Return an iterator for the elements of an array in a random order.
:param es: the original elements"""
self.elements = copy(es) # copy the data to be permuted
def __iter__( self ):
"""Return the iterator.
:returns: a random iterator over the elements"""
return self
def next( self ):
"""Return a random element
:returns: a random elkement of the original collection"""
n = len(self.elements)
if n == 0:
raise StopIteration
else:
i = int(numpy.random.random() * n)
v = self.elements[i]
del self.elements[i]
return v
```

In [9]:

```
def growing_component_numbers( n, es ):
"""Build the graph with n nodes and add edges randomly from es, returning
a list of the number of components in the graph as we add edges in a
random order taken from a list of possible edges.
:param n: the number of nodes
:param es: the edges
:returns: the number of components as each edge is added"""
# create an empty graph
g = networkx.empty_graph(n)
# add edges to g taken at random from the edge set,
# and compute components after each edge
cs = []
for e in permuted(es):
g.add_edge(*e)
nc = networkx.number_connected_components(g)
cs.append(nc)
return cs
```

In [10]:

```
# create an ER networks and grab its edges
er = networkx.erdos_renyi_graph(2000, 0.01)
es = er.edges()
# replay these edges
component_number = growing_component_numbers(2000, es)
# plot components against edges
fig = plt.figure(figsize = (5, 5))
plt.title("Consolidation of components as edges are added")
plt.xlabel("$|E|$")
plt.ylabel("Components")
plt.plot(range(len(component_number)), component_number, 'b-')
# edge at which the giant component forms
i = component_number.index(1)
# highlight the formation of the giant component
ax = fig.gca()
ax.annotate("$|E| = {e} ({p}\\%)$".format(e = i, p = int(((i + 0.0) / len(es)) * 100)),
xy = (i, 1),
xytext = (len(component_number) / 2, component_number[0] / 2),
arrowprops = dict(facecolor = 'black', width = 1, shrink = 0.05))
_ = plt.show()
```

The giant component forms well before we've added all the edges.

(Remember that thisd is a stochastic process. It's *possible* that a giant component would *never* form for a network, just by chance. However, for an ER network with 2000 nodes $\phi_c = \frac{1}{N} = 0.0005$, so $\phi = 0.01$ is well above the critical threshold.)

But *how* does the giant component form? Does it steadily accrete, or does it form suddenly as previously disconnected components connect? We can explore this by plotting the size of the largest component as we add edges, using the function `connected_components()`

that returns a list of components, largest first:

In [11]:

```
def growing_component_sizes( n, es ):
"""Build the graph with n nodes and edges taken from es, returning
a list of the size of the largest component as we add edges in a
random order taken from a list of possible edges.
:param n: number of edges
:param es: the edges
:returns: liost of largest component as each edge is added"""
g = networkx.empty_graph(n)
cs = []
for e in permuted(es):
g.add_edge(*e)
# pick the largest component (the one with the longest list of node members)
gc = len(max(networkx.connected_components(g), key = len))
cs.append(gc)
return cs
```

*number* of components on the same axes:

In [12]:

```
# compute list of component sizes as we add edges, re-using the
# ER edges we computed earlier
component_size = growing_component_sizes(2000, es)
fig = plt.figure(figsize = (5, 5))
plt.title("Emergence of the giant component as edges are added")
# plot the number of components
ax1 = fig.gca()
ax1.set_xlabel("Edges")
ax1.set_ylabel("Components", color = 'b')
ax1.plot(range(i), component_number[:i], 'b-', label = 'Components')
for t in ax1.get_yticklabels():
t.set_color('b')
# plot component sizes against edges
ax2 = ax1.twinx()
ax2.set_ylabel("Component size", color = 'r')
ax2.plot(range(i), component_size[:i], 'r-', label = "Component size")
for t in ax2.get_yticklabels():
t.set_color('r')
_ = plt.show()
```

Now isn't *that* interesting... Let's try to interpret what's happening. Quite early-on in the process of adding edges, there's a sudden jump in the size of the largest component in the network. Well before we get to the giant component, we start getting a component of hundreds, and then thousands, of nodes. The process by which we're adding edges is random and smooth, but nonetheless results in a sudden change in the connectivity of the network. The network consists of lots of small components that suddenly – over the course of adding a relatively small number of edges – join up and create an enormously larger component consisting of most of the nodes, which then itself gradually grows until it contains *all* the nodes. Below this threshold the network is composed of small, isolated collections of nodes; above it, it rapidly becomes one big component.

This is the first example we've seen of a critical transition, also known as a **phase change**: during a steady, incremental, process, the network changes from one state into another, very different state – and does so almost instantaneously.

We should examine the area around the critical point in more detail. First we need to locate it. Since the characteristic of the critical point is that the slope of the graph suddenly increases, we can look for it by looking at the slope of the data series:

In [13]:

```
def critical_point( cs, slope = 1 ):
"""Find the critical point in a sequence. We define the critical point
as the index where the derivative of the sequence becomes greater than
the desired slope. We ignore the direction of the slope.
:param cs: the sequence of component sizes
:param slope: the desired slope of the graph (defaults to 1)
:returns: the point at which the slope of the time series exceeds the desired slope"""
for i in xrange(1, len(cs)):
if abs(cs[i] - cs[i - 1]) > slope:
return i
return None
```

In [14]:

```
# find the critical point
cp = critical_point(component_size, slope = 50)
# some space either side of the critical point, with the
# right-hand side being more interesting and so getting more
bcp = int(cp * 0.8)
ucp = int(cp * 3)
fig = plt.figure(figsize = (5, 5))
plt.title("Details of the phase transition")
# plot the number of components
ax1 = fig.gca()
ax1.set_xlabel("Edges")
ax1.set_ylabel("Components", color = 'b')
ax1.plot(range(bcp, ucp), component_number[bcp:ucp], 'b-', label = 'Components')
for t in ax1.get_yticklabels():
t.set_color('b')
# plot component sizes against edges
ax2 = ax1.twinx()
ax2.set_ylabel("Component size", color = 'r')
ax2.plot(range(bcp, ucp), component_size[bcp:ucp], 'r-', label = "Component size")
for t in ax2.get_yticklabels():
t.set_color('r')
# add a line to show where we decided the critical point was
ax1.plot([cp, cp], # x's: vertical line at the critical point
ax1.get_ylim(), # y's: the y axis' extent
'k:')
_ = plt.show()
```

*number of components* comes down fairly smoothly, the *size of the largest component* jumps quickly as smaller components amalgamate.

In [16]:

```
def make_er_giant_component_size_by_kmean( n ):
"""Return a model function for a network with the given number
of nodes, computing the fractional size of the giant component
for different mean degrees.
:param n: the number of nodes"""
def model( kmean ):
phi = kmean / n
er = networkx.erdos_renyi_graph(n, phi)
gc = len(max(networkx.connected_components(er), key = len))
S = (gc + 0.0) / n
return S
return model
fig = plt.figure(figsize = (5, 5))
# plot the observed behaviour
kmeans = numpy.linspace(0.0, 5.0, num = 20)
sz = map(make_er_giant_component_size_by_kmean(2000), kmeans)
plt.scatter(kmeans, sz, color = 'r', marker = 'D', label = 'experimental')
# plt the theoretical behaviour
ss = numpy.linspace(0.0, 1.0, endpoint = False)
plt.plot(map((lambda S: - math.log(1.0 - S) / S), ss), ss, 'k,', label = 'predicted')
plt.xlim([0, 5])
plt.ylim([0.0, 1.0])
plt.title('Expected vs observed sizes of giant component')
plt.xlabel('$\\langle k \\rangle$')
plt.ylabel('$S$')
plt.legend(loc = 'lower right')
_ = plt.show()
```

*one specific* ER network that *might* happen to have properties that cause a giant component to form, or not form, or form with a slightly different size than predicted, just because of some fluke of way the edges are added. The mathematical expression gives us the expected behaviour that's overwhemingly probable in the case of large ($N \rightarrow \infty$) networks – but it can be misleading in any single clase, and in smaller networks.

There are many more properties of components we could explore, but we'll stop here: Newman [New10] presents many more calculations, for example about how the distribution of component sizes changes as edges are added.

There's an important point to make about all we've said above. You'll have noticed that a lot of the arguments relied on averaging, for example in identifying the *average* (mean) degree as greater than 1, or finding the *expected* size of the giant component. You might have wondered whether these sorts of calculations would be possible if for whatever reason we weren't able to do averaging.

Averaging works well for large networks: indeed, for really large networks we *have* to rely on statistical techniques, as all the details will generally be unavailable. And it's certainly the case that lot of phenomena of interest for complex networks (and complex processes) depend strongly on these statistical properties, with only very weak dependence on the details. This means we can often ignore the fine structure, the **micro-scale structure** of a network, and treat them as instances of classes defined by their **macro-scale structure**, the high-level summary statistics. Indeed, this is the basis for the techniques for managing variance by repetition that we'll see later when we scale-out our simulations.

*But*. (There was obviously a *but* coming.) There are also examples in which fine structure *does* matter – and even more cases where variations or irregularities in the structure make a huge difference. We'll see examples of these later, but an easily-understood example is the way an epidemic spreads on a network with communities of more-than-averagely-connected nodes: easily within communities, but with more difficulty between them because of the lesser connectivity. This is true even for networks with the same mean degree: the modular structure changes the process' behaviour.

The ER networks are special not because they're random – lots of networks have randomness – but because they're *so perfectly* random. They have, on average (that word again...), no fine structure to worry about, and so arguments based on averaging work, both for properties like the degrees of nodes and also for repeating experiments over different networks with the same parameters.

What about for more complex situations? It turns out that the other main class of networks, the powerlaw networks, have similar (but different) regularities that can similarly be exploited. There are other cases that don't have such nice features, and – while we can sometimes fall back on more powerful mathematical techniques, such as those associated with generating functions – we'll often be placed in situations where only extensive and careful simulation will get us anywhere. And simulation often requires an understanding of how the network is put together at a macro level as well as some understanding at least of the micro level, so the mathematical and computational views remain entwined.

(This is a chapter from Complex networks, complex processes.)

Networks consist of nodes connected by edges. We've already looked at the notion of a path in terms of providing a "rouyute to follow" to get from one node to another. We can look at paths between pairs of nodes to see whether they exist – is it possible to navigate from one node to the other? – and find paths of different lengths, including a possibly unique shortest path. We also considered one way of raising this local property to the global network level in order to find the network's diameter: in the network as a whole, what is the longest shortest path between *any* pair of nodes?

There's another such global question related to paths: is it always possible to find a path between any pair of nodes in the network? Clearly there's a major difference between networks for which the answer is yes, and other networks: in the former case, while it may be *hard* to find a path between tweo nodes, it will always be *possible*; in the latter case, some attempts at navigation are doomed to failure.

A network for which there is always a path between any pair of nodes is called **connected**. Connectivity is the property that says that navigation is always possible.

How do we determine if a network is connected? At some level we need to check that paths exist between all pairs of nodes, but that's going to be extremely expensive for large networks. Fortunately there's a simpler way, and even more fortunately `networkx`

provides it built-in.

In [1]:

```
import networkx
import numpy
import itertools
import cncp
import matplotlib as mpl
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
import matplotlib.cm as cmap
import seaborn
```

`networkx`

's `is_connected()`

function to test the network's connectivity:

In [2]:

```
l = cncp.lattice_graph(10, 10)
print 'Lattice connected? {c}'.format(c = networkx.is_connected(l))
```

In [3]:

```
l.add_node(9999)
print 'Lattice with extra node connected? {c}'.format(c = networkx.is_connected(l))
l.add_edge(9999, 1)
print 'Did the new edge re-connect things? {c}'.format(c = networkx.is_connected(l))
```

**disconnect** it by removing nodes, for example by "snipping off the corner":

In [4]:

```
l.remove_edges_from([ (0, 1), (0, 10) ])
print 'Still connected? {c}'.format(c = networkx.is_connected(l))
```

This works because we happen to know the way the nodes are labelled by `lattice_graph()`

, so we know which edges we need to remove. We could also have removed a band of edges across the centre of the lattice, or on a diagonal: as longf as we interrupt the path between *any one pair* of nodes, the network will no longer be connected.

These ideas work with larger groups of nodes as well. For example, suppose we place two networks "side by side", having edges internally but none between them:

In [5]:

```
# create two lattices
l1 = cncp.lattice_graph(5, 5)
l2 = cncp.lattice_graph(5, 5)
# re-label the second lattice so that the node labels will be unique
l2p = networkx.relabel_nodes(l2, lambda n: n + 1000)
# combine the two lattices together to form a single network
l = networkx.compose(l1, l2p)
print 'Two-lattice network connected? {c}'.format(c = networkx.is_connected(l))
```

Notice what we did to make this work:

- we built the two networks independently ;
- then we re-labelled one of them to make the node labels unique; and
- then composed them together.

Our lattice-creation function always labels nodes in the same way in the networks it creates, so after the first step we have two networks each with a common set of node labels. If we'd simply composed these networks together as-is, `networkx`

would have assumed that two nodes with the same label were *the same node* and would have combined them – and then combined all the edges too. We'd have ended up with a single lattice! By re-labelling the second network's nodes we ensure they're recognised as distinct, and therefore when we combine the two networks we get a a network with two lattices "side by side" and no edges between them.

Adding a single edge between nodes in the two lattices is of course enough to connect the network:

In [6]:

```
l.add_edge(0, 1000)
print 'Two-lattice network connected with extra edge? {c}'.format(c = networkx.is_connected(l))
```

In [7]:

```
l.remove_node(1000)
print 'Is the network still connected after removing a critical node? {c}'.format(c = networkx.is_connected(l))
```

And of course one of the "lattices" is now missing a node.

Let's return to the lattices we used to create the network above. Each of the lattices was itself a network, which we then joined together to firm the overall lattices-side-by-side network. But we can also observe that – in this case, although not necessarily – the two lattices were themselves connected. It was possible to go from any node in one lattice to any node *in the same lattice*; when we put them side-by side, this stopped being the case; and then when we added an edge it because possible to go from any node in one lattice to any node *in either lattice*.

So after we placed the lattices side-by-side we had a network with two **sub-networks**, each of which was connected, but the network taken together was disconnected. This property of being a connected sub-network of a larger structure is called being a **component** (or sometimes a **connected component**, although that's a bit tautologous). When we connected the two components together we created a single connected network, a single component.

Each component is a "island" of connectivity. Navigation is possible "on the island", but impossible "off the island". The number of components in a network is a measure of how many "islands" there are. We can use `networkxz`

to count both their number and their size:

In [8]:

```
print "Newly-split network as {c} components".format(c = networkx.number_connected_components(l))
# compute the sizes of the components
cs = list(networkx.connected_components(l))
for i in range(len(cs)):
print 'Component {i} contains {n} nodes'.format(i = i, n = len(cs[i]))
```

`max()`

or `sorted()`

to explicitly put them into the right order:

In [9]:

```
print 'Largest component has {n} nodes'.format(n = len(max(networkx.connected_components(l), key = len)))
```

The significance of components really becomes clear when we consider different ways of generating networks, especially using random processes. Many such processes don't actually guarantee to generate a connected network: they add edges between nodes randomly, so it's entirely possible that some nodes may be isolated or that two or more components may form. If this is important for an application, we need to be careful to make sure the network is connected *before* we start work on it. There are two basic ways to do this:

- we can check that the network is connected using
`is_connected()`

, and throw it away and start again; or - we can take the largest component from the network as-is.

Neither method is necessarily better. For the first, it might be that we *never* get a connected network because of some combination of parameters to the generator (for example the network has three nodes and we only ever add one edge: extreme, but you get the idea). For the second, we'll necessarily end up with a network that has fewer nodes than we thought: possibly less than half, depending on exactly how many components the generator gives rise to. So which method we adopt depends on the application, and we'll have to think carefully about the constraints of each scenario we explore.

*quite* components?

In [10]:

```
# build left network
l = networkx.Graph()
l.add_edges_from([ (1, 2), (2, 3), (3, 4), (4, 5) ])
# build left network
r = networkx.Graph()
r.add_edges_from([ (1, 2), (1, 3), (1, 4), (1, 5) ])
# create the figure
fig = plt.figure(figsize = (10, 5))
# draw left network
ax1 = fig.add_subplot(1, 2, 1) # one row of two columns, first box
ax1.grid(False) # no grid
ax1.get_xaxis().set_ticks([]) # no ticks on the axes
ax1.get_yaxis().set_ticks([])
networkx.draw_networkx(l, ax = ax1, node_size = 100)
# draw right network
ax2 = fig.add_subplot(1, 2, 2) # one row of two columns, second box
ax2.grid(False) # no grid
ax2.get_xaxis().set_ticks([]) # no ticks on the axes
ax2.get_yaxis().set_ticks([])
networkx.draw_networkx(r, ax = ax2, node_size = 100)
```

In [11]:

```
print 'Left network diameter {ld}.'.format(ld = networkx.diameter(l))
print 'Right network diameter {ld}.'.format(ld = networkx.diameter(r))
```

Clearly it's "quicker to get around" the right-hand network. So what would be the "quickest" network we could imagine? The minimum case is when the diameter of the network is 1. Remembering the definition of diameter as the longest shortest path, this would mean that the shortest path between any pair of nodes was 1 – or, to put it another way, every node was adjacent to every other. Such a network is called a **clique** (which rhymes with "speak", *not* with "click"). In the graph theory literature, the clique of $n$ nodes is referred to as $K_n$.

We can create cliques algorithmically:

In [12]:

```
# create a clique of five nodes
k5 = networkx.Graph()
for (n, m) in itertools.combinations(range(5), 2):
k5.add_edge(n, m)
# draw the clique
fig = plt.figure(figsize = (5, 5))
ax = fig.gca()
ax.grid(False) # no grid
ax.get_xaxis().set_ticks([]) # no ticks on the axes
ax.get_yaxis().set_ticks([])
networkx.draw_networkx(k5, node_size = 100)
plt.title('$K_5$')
_ = plt.show()
```

If you're not familiar with Python's `itertools`

package, it provides a whole suite of useful ways to combine sets of data. `itertools.combinations()`

takes a collection `l`

and a number `i`

and produces all combinations of `i`

objects taken from `l`

– in this case all pairs of nodes, with each pair appearing exactly once.

`networkx`

will, unsurprisingly, create cliques directly:

In [13]:

```
fig = plt.figure(figsize = (5, 5))
ax = fig.gca()
ax.grid(False) # no grid
ax.get_xaxis().set_ticks([]) # no ticks on the axes
ax.get_yaxis().set_ticks([])
networkx.draw_networkx(networkx.complete_graph(10), node_size = 100)
plt.title('$K_{10}$')
_ = plt.show()
```

*could* have is sometimes referred to as its **density**. It isn't a measure of connectivity *per se*, but can provide a useful metric for deciding whether a network is well-connected or sparse – concepts we'll come back to later.

In the lattices-side-by-side example above we had two components that we connected with a single edge. Suppose we scale things up a bit, to a large network with several large components. Suppose we then add a small number of edges between the components, thereby connecting the network. We now have a connected network and a single component: is there anything else to say about the matter?

Well clearly there is. The sub-networks are no longer components, it's true, but they're still recognisibly more connected *within* themselves than *between* themselves. We refer to these almost-components as **communities** or **modules**.

While the idea of being a component is very clear-cut, being a community is a lot more delicate. When is a collection of nodes "connected enough" internally and "not connected enouygh" externally to be termed a community? Can we always identify the communities of a network? As the number of edges increases, and the number of paths between pairs of nodes in two communities increases, at what point do they cease to be two communities and become one?

These are all interesting questions, which we'll return to later: the notion of community-finding is a very active research topic For the time being, it's sufficient to observe that the component (or community) structure of a network might have an influence on its properties, and in particular on how processes operate over it.

(This is a chapter from Complex networks, complex processes.)

So far we've looked at ER networks from a practical perspective, through simulation. This **numerical** approach is typical for computer scientists, and is very powerful. It has the enormous advantage of working for *any* network using the *same* set of techniques (and code). It has the enormous disadvantage, however, of often providing very little insight as to *why* the answer is as it is: why, for example, does an ER network have the bell-shaped degree distribution that it has, and what does this imply?

Often the numerical approach is the best we can hope for, especially in the face of irregular or otherwise "awkward" networks. But the ER network has a very regular construction process: surely we might expect to be able to do better?

An alternative to simulation in such cases is to take an **analytical** approach, to try to find closed-form mathematical expressions that answer the key questions we want to pose. This approach omly works in some cases – although these cases are vitally important and interesting, and it turns out that there are other analytic techniques that work for a still broader class of networks – but it has the advantage of not requiring simulation that may be time-consuming and subject to various statistical constraints: analysis provides precise, uniform answers.

In this chapter we'll look at some properties of ER networks from this perspective and derive mathematical expressions for them. We'll focus only on those properties that are most important from a practical perspective: the dergree distribution and the mean degree. (The Wikipedia page for ER networks describes – but doesn't derive – lots of other properties of largely theoretical interest.) We'll do this from first principles and at some length, to demonstrate the sorts of mathematical arguments that'll be common in what's to come.

We'll start by returning to the degree distribution, the numbers of nodes with given numbers of immediate neighbours in the network. We observed earlier that we can interpret the degree distribution in terms of probability: what is the probability of a node $v$ chosen at random having a given degree $k$? In normal probability notation this would be written $P(deg(v) = k)$, the probability that $deg(v)$, the degree of $v$, is equal to $k$. For brevity we will usually write this as $p_k$. Taken over the whole network, this will yield a degree distribution, where the probability of all possible degrees in the network sum to one: $\sum_k p_k = 1$.

So what is the degree distribution for an ER network? At first acquaintance, many non-mathematicians would argue something like this: the generating process adds an edge between any pair of nodes with a fixed probability $\phi$, with every edge (and every node) treated equally. Therefore, we'd expect every node to have roughly the same degree as every other – a degree distribution that's *uniform* – consistent with the uniformity of the generating process.

Does that sound reasonable? – it did to me when I first made this argument. But we know from the simulation we did earlier that this *isn't* what happens: we actually get a *normal* distribution of degrees, not a uniform one. (If you need more convincing about this, read the rest of this section and then skip to the epilogue at the end of the chapter.) Clearly there must be another way of thinking about the process.

Let's re-phrase the question: in an ER network, how does a node end up having degree $k$? We can answer this by looking back at the construction process, where we iterated through all the pairs of nodes and added an edge between them with a given, fixed, probability $\phi$ (which we denoted `pEdge`

in the code). So each node *could in principle* have been connected to $N - 1$ other nodes: that's the maximum degree it could have, since we've excluded the possibility of self-loops or parallel edges. For each of these potential edges, we essentially tossed a coin to decide whether the edge was included or not – except that the "coin" came down "heads" with a probability $\phi$, and therefore came down "tails" with a probability $(1 - \phi)$ (since there are only two alternatives, and their probabilities have to sum to 1). Let's refer to each such decision – add an edge or don't – as an *action*. For each node we perform $N - 1$ actions, one per potential edge, and for a node to have a degree $k$ we have to perform $k$ "add" actions and $(N - 1 - k)$ "don't-add" actions. We can perform these actions in any order.

How many ways are there to perform this sequence of actions? Suppose we have a bag of $a$ actions: how many ways are there to select $b$ actions from the bag? The answer to this is given by the formula $\frac{a!}{b! (a - b)!}$, a result known as the **binomial theorem**. This value is often denoted $\binom{a}{b}$, so:

So, returning to our original question, we have $\binom{N - 1}{k}$ ways to perform $k$ "add" actions from a possible $N - 1$ actions, with the remainder being "don't-add" actions. This is the number of possible sequences that, for a given node, can result in that node having degree $k$. From elementary probability theory, to work out the probability of a sequence of actions happening we multiply-out the probabilities of the individual actions: "this *and* this *and* this" and so forth. So for each sequence of $k$ add actions and $(N - 1 - k)$ don't-add actions we multiply the probailities of each action together to get the probability of them *all* happening, and then multiply this compound probability by number of ways these actions can happen so as to still give us the $k$ edges we want.

Putting all this together, what is the probability that a node $v$ taken at random from an ER model consisting of $N$ nodes and edge probability $\phi$ will have degree $k$? For a node to have degree $k$ we need to perform a sequence of actions consisting of $k$ add actions (each occurring with probability $\phi$ ); *and* we need $(N - 1 - k)$ don't-add actions (occurring with probability $1 - \phi$); *and* there are $\binom{N - 1}{k}$ ways in which these actions can be arranged. Expressing this as maths, we get:

This is a distribution well known in statistics as the **binomial distribution**. It's important to note that $\phi$ is a constant, and that each add action is independent of each other add action: it doesn't get any easier to add edges over time. (If this seems like an obvious thing to say, we only say it because this turns out to be different to the approach we'll take to BA networks later.)

Given that we are dealing with large graphs, we will simplify the $N - 1$ term to $N$, since it makes very little difference as $N \rightarrow \infty$, yielding:

$$p_k = \binom{N}{k} \, \phi^k \, (1 - \phi)^{N - k}$$What happens as $N$ gets larger and larger? Clearly $\binom{N}{k}$ also gets larger and larger (there are more and more ways to choose the $k$ edges), and $(1 - \phi)^{N - k}$ gets smaller and smaller (since $1 - \phi$ is by definition less than 1), while $\phi^k$ stays the same size. What happens therefore depends on whether the rise term or the falling term dominates in the limit, which isn't blindingly obvious but fortunately the answer *is* known: the binomial distribution converges to another distribution, the **Poisson distribution**, as $N \rightarrow \infty$. The Poisson distribution is basically the normal distribution for systems built from discrete events, and is given by:

While this form is easier to work with, it's a lot less suggestive. The binomial form is probably to be preferred as a way of thinking about the distribution simply because each of the factors within it relates to a real, concrete phenomenon: add actions, don't-add actions, their probabilities (summing to 1), and the number of ways of combining them.

It's also worth noting that, in using an analytical approach, we were able to appeal to lots of known results in mathematics about the number of possible combinations of actios, or the ways functions behave in the limit – and with no need to write any code or burn any computer time.

In [1]:

```
import math
import numpy
import matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
import seaborn
```

In [3]:

```
def poisson( n, pEdge ):
'''Return a model function for the Poisson distribution with n nodes and
edge probability pEdge.
:param n: number of nodes
:param pEdge: probabilty of an edge being added betweed a pair of nodes'''
def model( k ):
return (pow(n * pEdge, k) * math.exp(-n * pEdge)) / math.factorial(k)
return model
fig = plt.figure()
plt.xlabel("$k$")
plt.ylabel("$p_k$")
plt.title('Poisson degree distribution, $N = {n}, \phi = {phi}$'.format(n = 1000, phi = 0.05) )
plt.plot(xrange(100), map(poisson(1000, 0.05), xrange(100)))
_ = plt.show()
```

The graph is symmetric around the point $x = 50$, suggesting that this is the mean. Looking at the parameters of the distribution, however, we plotted 1000 nodes with an edge probability of 0.05, which multiplied-out also give 50. That's suggestive, but we need to *prove* that its the case *always*.

First let's re-visit the idea of a mean. The mean of any random variable can be written as the sum of each value the variable can take m,ultiplied by the probability of it taking that value. For the mean degree, we therefore have:

\begin{align} \langle k \rangle &= 1 \times p_1 + 2 \times p_2 + \cdots \\ &= \sum_{k = 1}^N k \, p_k \end{align}(The maximum node degree is actually $N - 1$ since we're looking at simple networks, so we only really need to sum $k$ up to $N-1$ rather than $N$ – but that just means that $p_N = 0$, so the sum works out anyway.) For the Poisson distribution underlying an ER network, we can code-up this definition using the formula above to work out the probability for each $k$. If $N = 1000$ and $\phi = 0.05$ as above, then:

In [4]:

```
sum = 0
p = poisson(1000, 0.05)
for k in xrange(1, 100):
sum = sum + k * p(k)
print 'Computed mean degree = {kmean}'.format(kmean = sum)
```

Close enough. But we can do better: we can obtain an analytic result and compute the formula for the mean degree given $N$ and $\phi$. We can identify the two definitions above to get that:

$$ \langle k \rangle = \sum_{k = 1}^N k \, \binom{N}{k} \, \phi^k \, (1 - \phi)^{N - k} $$So we need to find out the value of the sum on the right-hand side. To do this we need to know another property of the binomial distribution, which is that:

$$ (p + q)^n = \sum_{d = 1}^{n} d \binom{n}{d} \, p^d \, q^{n - d} $$Now, if we differentiate both sides with respect to $p$, we get:

\begin{align*} n(p + q)^{n - 1} &= \sum_{d = 1}^{n} \binom{n}{d} \, d \, p^{d - 1} \, q^{n - d} \\ &= \frac{1}{p} \sum_{d = 0}^{n} d \binom{n}{d} \, p^d \, q^{n - d} \\ np(p + q)^{n - 1} &= \sum_{d = 1}^{n} d \binom{n}{d} \, p^d \, q^{n - d} \end{align*}and the right-hand side starts to look very like the form we're looking for from above. If we now express it in terms of $N$, $\phi$, and $k$ to get the notation straight, and let $q = 1 - p$, then:

\begin{align*} N\phi(\phi + (1 - \phi))^{N - 1} &= \sum_{k = 1}^{N} \binom{N}{k} \, \phi^k \, (1 - \phi)^{N - k} \\ N\phi &= \sum_{k = 1}^{N} \binom{N}{k} \, \phi^k \, (1 - \phi)^{N - k} \\ &= \langle k \rangle \end{align*}So the mean of the binomial degree distribution is given by $N \phi$. Looking at the equations, we can see that $N$ and $\phi$ are the only parameters: we need to know them, *and only them*, to compute the distribution for any value of $k$. We can therefore say that $N$ and $\phi$ *completely characterise* the distribution.

There is another implication of this. Since $\langle k \rangle = N\phi$, for large $N$ we can make use of the fact that the binomial distribution converges to the Poisson distribution and re-write the probability distribution for at ER network in terms of the network's mean degree:

$$ p_k = \frac{\langle k \rangle^k e^{-\langle k \rangle}}{k!} $$This means that given two of $N$, $\phi$, and $\langle k \rangle$, we can compute the other, and we have all we need to completely characterise the degree distribution of an ER network. Put still another way, if we want an ER network with a specific number of nodes and a mean degree, we can compute the link probability $\phi = \frac{\langle k \rangle}{N}$ we need to construct it.

Earlier we asserted that many people, on first seeing the generating process for the ER model, assume that it will result in a uniform degree distribution. I certainly did. Since it's such a common reaction, it's perhaps worth exploring a little why it's also wrong.

The argument for a uniform degree distribution goes roughly as follows: since the edge probability is independent for every edge, we'd expect that, at each node, we select roughly the same number of edges to add, and therefore there's no reason for one node to be preferred over another, so they should all have roughly the same degree.

The problem here is that it takes a statement about *edges* and subtly converts it into a statement about *nodes*. Just because we select edges with a constant probability doesn't imply that we do so uniformly at the node level – so uniformly, in fact, that every node ends up having *exactly* the same number of edges. Put that way, a uniform degree distribution actually sounds rather unlikely! The process only says that, *over the graph as a whole*, edges are added with constant probability: it does not say anything about the *local* behaviour of edge addition around an individual node. It is this that allows for the possibility of non-uniform distrbution.

This observation – that global behaviour, and typically global regularity, doesn't lead to local regularity – is perhaps the single most important thing to bear in mind about complex networks. It's tempting to think that large-scale regularity emerges from lots of small-scale regularity, but that isn't necessarily the case: the small scale could be irregular, but the irregularities could even out. Conversely, it's tempting to think that something that looks regular and well-behaved on the outside has component pieces that are regular and well-behaved – and again that isn't necessarily the case. The lesson here is that things can be more complex than they seem. On the other hand, it also means that we can often ignore local noise and make use of global properties, as long as we're careful.

The description we used for the ER generator is an example of a process that in mathematics is called a Bernoulli process, where we look at the sequence of actions needed to generate a given outcome and compute how many ways there are for those actions to occur at random. Bernoulli processes occur whenever we encounter actions being performed one after the other according to some random driver, and the argument above is completely typical of how one deals with them.

(This is a chapter from Complex networks, complex processes.)

Let's now look at the best-understood complex network. If there's a poster child for network science, it's the "random graph", or more properly, the *Erdős-Rényi* or *ER network*. We mentioned Erdős and Rényi in the introduction as the mathematicians who first gave shape to the idea that large networks with essentially random structure might still show some usefulÂ statistical properties that made them more comprehensible. In this chapter we'll see what these regularities are. The ER networks are complex enough to allow us to demonstrate techniques that will apply in other circumstances, but are simple and well-behaved enough to make this analysis fairly straightforward.

We'll explore the ER network in some detail both through simulation and through mathematical analysis. We'll do it this way for a good reason: in the real world, networks often cannot be guaranteed to have exactly the properties that the mathematical techniques require, but computer simulation really needs to be driven by an understanding of what's going on in network at a fundamental level and how the mathematical features contribute to this behaviour. For these reasons, it's not safe to only understand how to simulate networks: you need to be able at least to follow the mathematical analysis as well. Conversely, understanding real networks and applications requires the techniques of simulation as well as analysis.

We'll start by building ER networks using `networkx`

and explore some of the properties that we developed earlier. We'll then look at the same properties (and more) from a more mathematical perspective, and relate the code to the maths to show how the two views interrelate.

To build an Erdős-Rényi (or ER) network with $N$ vertices, we proceed as follows:

- Build a graph $G = (V, E)$ with $N$ vertices and no edges, so $|V| = N$ and $E = \emptyset$
- For each pair of vertices $v_1, v_2 \in V$ with $v_1 \neq v_2$, add an edge $(v_1, v_2)$ to $E$ with probability $\phi$

That's it! – a very simple process for constructing what turns out ot be a very interesting class of networks. There are a four things to notice here, all of whch turn out to be very important for what follows.

Firstly, the ER model has two parameters: the number of nodes in the network $N$, and the probability $\phi$ of an edge occurring between any given pair of nodes. The combination of these two parameters defines a **class** of networks, depending on exactly which pairs of nodes are connected at random at the connection stage.

Secondly, the probability of an edge appearing between any pair of nodes is an independent event: it doesn't matter whether a node is already heavily connected or not, the chances of its being linked to any other node is just $\phi$ – and this probability doesn't change over time.

Thirdly, we disallow both self-loops and parallel edges, thereby creating a simple network.

Fourthly, we build the network "all at once", with all its nodes and all its edges in place before we do any further analysis.

To build such a network, we need to turn the description into code. We can do this in two ways using `networkx`

:

- by implementing the construction process ourselves; or
- by using the built-in generator function

The latter is clearly entirely adequate in practice, but for demonstration purposes, we'll do both.

In [1]:

```
import networkx
import math
import numpy
import matplotlib as mpl
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cm
import seaborn
from JSAnimation import IPython_display
from matplotlib import animation
```

In [2]:

```
def erdos_renyi_graph_from_scratch( n, pEdge ):
"""Build the graph with n nodes and a probability pEdge of there
being an edge between any pair of nodes.
:param n: number of nodes in the network
:param pEdge: probability that there is an edge between any pair of nodes
:returns: a network"""
g = networkx.empty_graph(n)
# run through all the possible edges
ne = 0
for i in xrange(n):
for j in xrange(i + 1, n):
if numpy.random.random() <= pEdge:
ne = ne + 1
g.add_edge(i, j, { 'added': ne })
return g
```

(We use `n`

for $N$ and `pEdge`

for $\phi$.) Notice the way we run through the pairs of nodes so that we only try to generate an edge once between each pair. This works because the graph we're building is undirected and we want at most one edge between each pair of nodes *in either order*. (There are also directed ER networks: to build on of those we'd want to try each pair *in each order* to allow for directionality.)

The key `networkx`

method here is `add_edge`

, which adds an edge between a pair of nodes. It's optional third parameter is a dictionary of attribute/value pairs that are associated with the edge, and we use this to record the order in which the edge was added so we can visualise the growth of the network below.

We can then use this function to build an ER network, for example with 5000 nodes and a 5% probability of there being an edge between any pair of nodes:

In [3]:

```
g_from_scratch = erdos_renyi_graph_from_scratch(5000, 0.05)
```

`networkn`

's "generator" function for ER networks built-in that we can use to build a graph with the same properties as above:

In [4]:

```
g_from_generator = networkx.erdos_renyi_graph(5000, 0.05)
```

`g_from_scratch`

and `g_from_generator`

are instances of the class of ER networks. They aren't *the same network*, though, even though they have the same parameters, because they've been created by stochastic processes and so will have different connections between their nodes. However, they will both share certain statistical characteristics that we'll come back to after we look at the growth processes ion more detail.

It can sometimes be useful to see how these graphs grow, by means of animation. We can use `matplotlib`

to draw a graph progressively, one node at a time, and show how the edge set grows too. We can then use the `JSAnimation`

plug-in to generate an in-line animation, or save the animation to a file and link to it.

`matplotlib`

's animation functions are quite involved. The core is a function that creates a figure for each frame of the animation, which `matplotlib`

then links together like the pages of a flick-book. There's quite a lot of set-up involved too, though: the following code is heavily commented to (hopefully) show what's going on.

In [26]:

```
def animate_growing_graph( g, edges, fig, ax = None, pos = None, cmap = None, **kwords ):
"""Animate the growth of a network, showing how edges are added and
how node degrees evolve. Slow if done for a large graph. Returns a
matplotlib animation object that can be saved to a file for later
or shown in-line in a notebook.
:param g: the network
:param edges: the edges, in the order they were added
:param fig: the figure to draw into
:param ax: (optional) the axes to draw into (defaults to main figure axes)
:param pos: (optional) layout for the network (default is to use the spring layout)
:returns: an animation object"""
# fill in the defaults
if ax is None:
# figure main axes
ax = fig.gca()
if pos is None:
# layout the network using the spring layout
pos = networkx.spring_layout(g, iterations = 100, k = 2/math.sqrt(g.order()))
if cmap is None:
cmap = cm.hot
if ('frames' not in kwords.keys()) or (kwords['frames'] is None):
# animate at one second per edge
kwords['frames'] = int(len(edges) * (1.0 / kwords['interval']))
# manipulate the axes, since this isn't a data plot
ax.set_xlim([-0.2, 1.2]) # axes bounded around 1
ax.set_ylim([-0.2, 1.2])
ax.grid(False) # no grid
ax.get_xaxis().set_ticks([]) # no ticks on the axes
ax.get_yaxis().set_ticks([])
# work out the colour map for the degrees of the network, picking
# colours linearly from the length of the colour map
ds = g.degree().values()
max_degree = max(ds)
min_degree = min(ds)
norm = colors.Normalize(vmin = min_degree, vmax = max_degree)
mappable = cm.ScalarMappable(norm, cmap)
# We now create all the graphical elements we need for the animation as matplotlib
# lines and patches. Essentially this defines what's in the final frame of the animation.
# We'll then make everything invisible and, as the animation progresses, make the elements
# appear in the right order. It's a lot faster to do it this way rather than re-building
# each frame from nothing as we go -- although that works too.
# generate node markers based on positions
nodeMarkers = dict()
nodeDegrees = dict()
for v in g.nodes_iter():
circ = plt.Circle(pos[v], radius = 0.02, zorder = 2) # place node markers at the top of the z-order
ax.add_patch(circ)
nodeMarkers[v] = circ
nodeDegrees[v] = 0
# build the list of edges as they were added
edgeMarkers = []
edgeEndpoints = []
for (i, j) in edges:
xs = [ pos[i][0], pos[j][0] ]
ys = [ pos[i][1], pos[j][1] ]
line = plt.Line2D(xs, ys, zorder = 1) # place edge markers down the z-order
ax.add_line(line)
edgeMarkers.append(line)
edgeEndpoints.append((i, j))
# work out the "time shape" of the animation
nFrames = kwords['frames'] # frames in the animation
framesPerEdge = max(int(nFrames / len(edges)), 1) # frames per edge
# add colourbar for node degree
kmax = max(g.degree().values())
cax = fig.add_axes([ 0.9, 0.125, 0.05, 0.775 ])
norm = mpl.colors.Normalize(0, kmax)
cb = mpl.colorbar.ColorbarBase(cax, cmap = cmap,
norm = norm,
orientation = 'vertical',
ticks = range(kmax + 1))
# initialisation function hides all the edges, colours all nodes
# as having degree zero
def init():
x = 1
for em in edgeMarkers:
em.set(alpha = 0)
for vm in nodeMarkers.values():
vm.set(color = mappable.to_rgba(0))
# per-frame drawing for animation
def frame( f ):
# frame number boundaries for various transitions in the animation "shape"
atEdge = int((f + 0.0) / framesPerEdge) # the edge we've reached with this frame
if framesPerEdge == 1:
a = 1
else:
a = ((f + 0.0) % framesPerEdge) / framesPerEdge
if atEdge < len(edgeMarkers):
edgeMarkers[atEdge].set(alpha = a)
if(a == 1):
(i, j) = edgeEndpoints[atEdge]
nodeDegrees[i] = nodeDegrees[i] + 1
nodeMarkers[i].set(color = mappable.to_rgba(nodeDegrees[i]))
nodeDegrees[j] = nodeDegrees[j] + 1
nodeMarkers[j].set(color = mappable.to_rgba(nodeDegrees[j]))
# return the animation with the functions etc set up
return animation.FuncAnimation(fig, frame, init_func = init, **kwords)
```

In [6]:

```
# build the network, which annotates the edges with their order of addition
er = erdos_renyi_graph_from_scratch(100, 0.03)
# pull the edges as a dict from edge to order
er_edges_dict = networkx.get_edge_attributes(er, 'added')
# return a list of edges in order of addition
er_edges = sorted(er_edges_dict.keys(),
key = (lambda e: er_edges_dict[e]))
```

We can then generate and show the animation:

In [29]:

```
fig = plt.figure(figsize = (8, 6))
anim = animate_growing_graph(er, er_edges, fig, frames = 100)
IPython_display.display_animation(anim, default_mode = 'once')
```

Out[29]: