Let’s now return to the compartmented models of disease we developed earlier. These were continuous models, described using calculus. Networks, by contrast, are discrete structures consisting of nodes and edges. We therefore need to develop a way to think about the continuous model in a discrete way appropriate to being applied as a process over a network..

(This is a chapter from Complex networks, complex processes.)

# From continuous to networked compartmented models¶

When we considered SIR earlier we looked at it from the perspective of differential equations: a **continuous** model where the population sizes are assumed to be real numbers. This makes a certain amount of sense if we think of compartments as fractions of an overall population. However, from another perspective, it's clear that there's another perspective in which only whole numbers of people become sick, leading to a **discrete** model that places an integer number of individuals into each compartment. How do we reconcile these two views?

## Revisiting SIR¶

The continuous model is best thought of as modelling the large-scale, **macroscopic** behaviour of the epidemic in which the don't really care about the exact numbers of individuals concerned. Also, for a large population, considering the relative sizes of compartments to a few decimal palces of accuracy will still yield something close to a whole number of individuals per compartment when the compartment fractions are scaled-up to the size of the overall population.

But we can also ask what happens at the **macroscopic** scale, for individuals. In that case we want to know how the disease might evolve in a *single person*. Another way to think of this is that a comparttmented model allows each individual person to traverse the compartments according to the probabilities associated with each transition.

Clearly the macroscopic and microscopic descriptions are related: we assume that, if we let a disease run through a population, then the ways in which individuals' infections evolve will integrate to reflect the macroscopic description in terms of fractions of the entire population.

### Well mixing¶

As well as being continuous, however, there's another assumption implicit in the contionuous description. Let's re-visit the equations describing SIR:

$$ \frac{ds}{dt} = -\beta s(t) i(t) \hspace{1in} \frac{di}{dt} = \beta s(t) i(t) - \alpha i(t) \hspace{1in} \frac{dr}{dt} = \alpha i(t) $$Here $i(t)$ denotes the fraction of the population who are infected as time $t$. The rate of change in this population, $\frac{di}{dt}$, has two terms: a growth term $\beta s(t) i(t)$, and a reducing term $\alpha i(t)$. The growth term says that the infected population grows at a rate that is proportional to the total number of (susceptible, infected) pairs in the population, which is simply the product of the two population sizes: in each unit of time, all these people meet each other and a fraction $\beta$ of the susceptibles become infected.

The assumption, clearly, is that all these pairs of people *do actually meet*, and this is a strong assumption. It's called the assumption of **well-mixing**, or alternatively of a **homogeneous** population. We discussed this earlier when we talked about attack rates and reproduction numbers. In "small" populations, well-mixing isn't a totally unreasonable assumption – although it *is* still an approximation of reality (even the people in my small village don't all meet each other every day). If we were to consider a population the size of Scotland, it's clearly implausible.

That doesn't mean we should throw the model away. The statistician George Box is quoted as saying, "*There is no need to ask the question 'Is the model true?'. If 'truth' is to be the 'whole truth' the answer must be 'No'. The only question of interest is 'Is the model illuminating and useful?'*" But the simplification of SIR to three differential equations does smear-out some structure that might be important – and, it turns out, *is* important in the sense that there are disease phenomena that occur in nature that don't occur in this system. Putting SIR onto a network is one way of addressing this.

## Diseases on networks¶

So in moving to diseases on networks we're trying to address two issues:

- that populations exhibit structure and so are not well-mixed; and
- that diseases occur in individuals, not simply in populations.

To address the first issue, we use a network to represent individuals and their interactions, with the connection structure of the network providing the opportunity for different kinds of inhomogeneity.For the second issue, we develop a discrete description of SIR, consistent with the continuous version, that we can apply to the individual noides of the network. We can then study how different network structures affect the properties of an epidemic.

### Representing a population as a network¶

The first step is conceptually the easier, but has some subtleties. The natural way to treat a population as a network is to have one node per individual in the population. Edges between nodes represent social interactions that are opportunities for infection. If a susceptible person is connected by an edge to an infected person, then there is an opportunity for the latter person to infect the former. Conversely, if there is no such edge, then the susceptible person cannot be infected that the infected individual, since there exists no social contact between them.

How might we construct this network? The simplest approach is undoubtedly to create a random network of some kind: perhaps an ER network, in which case we will obtain a "social network" for $N$ individuals who interact in a random way with a well-defined mean number of others. Simulating an epidemic will then involve running our toi-be-designed discrete disease process over this network, and examining the results.

### Moving towards realism¶

A moment's thought will show several problems with this approach. Firstly, not all contacts are created equal, as we saw when we discussed secondary attack rates: people in close contact (such as children in a nursery, or people in a care home) are more likely to infect one another than people in weaker contact (such as workers in a factory). We could address this issue, perhaps, by **weighting** the edges between people to capture that fact that "some edges are more infectious than others". Alternatively, we might argue that these factors will even-out over a suiitably heterogeneous population, and so if we focus on the probability of infection for an "average social contact" we can still extract meaningfuil information from any simulation.

Secondly, how are individuals to be connected in the infection network? Are their connections random? Do they exhibit a more clustered structure? Are there dense packets of highly connected individuals, separated by sparse connections? These are questions of network degree, connectivity, and so forth – of network topology in general – and intuitively it seems clear that the choice may make a difference. We might, for example, expect a disease in a well-connected, high-mean-degree network to spread differently to the same disease on a network with lower connectivity.

Thirdly, we have described a **static** network whose connections don't change over time. Relating this back to the context we're considering, that doesn't seem appropriate. People might be expected to avoid individuals who are sick, or the sick individuals might be quarantined to preclude social contact. Either of these behaviours would be expected to remove social contacts – edges – from the part of the network around an infected individual.

(When I was growing up in England in the 1970s, parents actually demonstrated exactly the opposite behaviour. If a child got measles, for example, mothers all brought their children round for a play date with the explicit intention of getting them infected too – the logic being that exposing a child to the disease early was good for their immune systems, got the one-off infection "out of the way", and generally improved herd immunity. None of those arguments are at all wrong, but this approach to parenting seems to have gone out of fashion.)

In either case, we might think that it is more appropriate to adopt an approach that changes the structure of the network in response to infection, perhaps reducing the number of edges when a node is infected. In this case we have a **dynamic** or **adaptive** network structure, where the network responds to the progress of the process running over it. Again, we might decide that these effects will even-out and can be ignored to give an "average" result.

The upshot of this discussion is that we can take a simple representation – a static, random network with unweighted links – and then add more features if we think they might be relevant. As we do so we make the model more realistic – but also more complicated, and and we add to the number of possible degrees of freedom.

Adding more factors in pursuit of realism may sound attractive, but we have to bear in mind that it also gives us a freedom we may not be able to use effectively. Consider the case where we reduce the number of edges to an infected individual. How many edges do we remove?, and how do we select them? – and will these choices make a critical difference?, and how do they interact with the existying parameters of the model? In adding a new freedom we also add a considerable burden of analysis and simulation to check what effects our new freedom has. Might it be better to stick with the simplest case?

This argument might sound bogus to you: a cop-out just to reduce the amount of work we have to do. And if your primary interest is in the dynamics of a *particular* disease, about which you want to make accurate predictions – as would be the case for planning a clinical response to an outbreak – then of course you may strive to build *the most realistic model possible* and accept the associated extra work. On the other hand, if your primary interest is epidemic processes in general, you might be happy to stick to simpler models to see whether they *always* exhibit certain features which can then be generalised (with care) to *all* diseases. We'll see an example of this later in the case of epiudemic thresholds, where certain combinations of infectiousness and recovery *necessarily* lead to epidemics pretty much regardless of everything else.

## Discretising the compartmented model¶

Now let's return to the second issue we identified above: moving from a continuous to a discrete description of the disease process.

Compartmented models of disease represent diseases as a collection of compartments. We notionally consider each individual in the population to be "in" a particular compartment at a given time. As their disease progresses, they move "from" one compartment "to" another, typically according to some stochastic process where their re-location happens with some probability. In addition, this probability may be affected by other factors, for example the presence of individuals in other compartments as neighbours. When looking at the overall disease behavuour (the macroscopic view) we are typically interested in how the relative sizes of the compartments changes. When looking at the disease's progress (the microscopic view) we additionally need to know about the compartments of neighbouring individuals. It is precisely this microscopic behaviour that is missing from the continuous-process description of compartmented models.

How then do we describe interactions at the scale of individual nodes?

Let's look again (not for the last time) at the differential equations for SIR:

$$ \frac{ds}{dt} = -\beta s(t) i(t) \hspace{1in} \frac{di}{dt} = \beta s(t) i(t) - \alpha i(t) \hspace{1in} \frac{dr}{dt} = \alpha i(t) $$There are three compartments, and the three equations (one per compartment) tell us how their population changes. Looking at the last equation, we see that $r(t)$ increases at a rate proportional to the $i(t)$, the size of the infected compartment. Similarly, looking at the first equation, $s(t)$ decreases at a rate proportional to the number of susceptibl;e-infected pairs. In the second equation, these two effects both appear inverted – understandably, since individuals pass through infection to recovery, and rates have to balance out if we are to keep the population constant.

So much for the compartments: what does this mean for an individual?

### Discrete infection dynamics¶

We know that we are representing the interactions between individuals as network edges. Suppose that at some time we have a given susceptible individual. That individual cannot become infected spontaneously, but only through interaction with an individual who is infected at the same time and with whom she has some social contact, represented by an edge. So to determine whether the susceptible individual is infected, we need to know whether she has any edges that lead to individuals who are infected. We refer to such edges as **SI edges**: they connect a susceptible node to an infected node.

Suppose we have found an SI edge linking our susceptible node to an infected neighbour. The infection "passes along" this edge with a probability $\beta$, turning our susceptible node into an infected node, decreasing the population of the susceptible compartment by one and increasing the population in the infected compartment by one.

But there is also another effect. The edge down which the infection travelled is no longer an SI edge, since it now connects an infected node to *another* infected node. Furthermore any other SI edges that connected our formerly-susceptible node to infected nodes are also no longer SI edges. And finally, the fact that our formerly-susceptible node is now an infected node means that there may be new SI edges created, where there are edges between our node and a neighbnouring susceptible node.

This is quite a bit more complicated than the equations suggest at first glance. It is perhaps simpler to think of it slightly differently. It is the population of SI edges, *not* the population of susceptible or infected individuals alone, which determines the rate of infection: that much is clear from the infection term. The infection dynamics happens, not at individual nodes, but at SI edges. If can think of the SI edges as a **locus** for the infection dynamics: a place at which infection possibly occurs. The edges in that locus are potentially changed by every infection **event**: every time an SI edge actually results in an infection.

### Discrete removal dynamics¶

What about removal? Removal happens macroscopically at a rate proportional to the fraction of the population infected. Microscopically this translates to an infected individual being removed with a probability $\alpha$. Suppose we follow the disease progression our our newly-infected node. The node is removed with probability $\alpha$, moving the node from the infected to the removed population. This happens independently of other nodes or edges. It may also affect the population of SI edges, since an edge involving a removed node is no longer SI. So while the removal event is purely local and happens without reference to any other nmodes or edges in the network, *once it has happened* it has an impact on the SI edges – and therefore, indirectly, in future infection events. The locus for removal events is therefore the population of nodes in the infected compartment, any of which may spontaneously be removed.

### A discrete description of SIR dynamics¶

Summing-up the above, we can now formulate a discrete description of SIR.

The model consists of three compartments: susceptible (S), infected (I), and removed (R). Each node resides in exactly one compartment at any time. There are two loci for the dynamics: SI edges, and infected nodes. There are two events: infection happens at the SI locus with probability $\beta$, while removal happens at the I locus with probability $\alpha$. The infection event moves the S node into the I compartment; the removal event moves the I node into the R compartment. Removal therefore affects the contents of the I locus, and both events may affect the contents of the SI locus. If we compare this description to the three equations above it is hopefully easy to see the derivation.

What we've done is quite significant, though. We've moved from a description consisting of three continuous rates of change (the three differential equations) to a description consisting of two discrete events, each happening at a different locus. The events can be applied to individual nodes or edges in our network model, in which we would need to track exactly which nodes are in which compartments, and which edges are in the SI locus we're interested in. It's worth noting that we really don't care about removed nodes: they don't appear in either locus, and therefore can't affect the dynamics, other than by the fact that nodes that are removed are by definition *not* susceptible or infected.

The process description is an essential step along the way to simulation, but we're not quite there yet. We need to be able to express the above model in a computational form suitable to be executed. We need to be able to keep track of the populations in the different loci of the dynamics. And we need to choose where, and at what times, the different events occur.