There are lots of different spreading processes, but perhaps the most engaging — as well as the most socially important — is the way epidemics spread. Before we think about how epidemics spread over networks, we should first explore how epidemics are typically encountered and understood in medicine.

(This is a chapter from Complex networks, complex processes.)

# The biology of epidemics¶

Epidemics are a classic case of a process running over a network: a social network in this case, where the edges represent social conjnections between individuals. Network-based epidemic spreading processes come in a large number of variants, but before we start looking at any of them we need to understand something of about the science of epidemics in the real world: *epidemiology*.

Epidemiology is usually studied as part of medicine, but in reality it has (at least) two branches. There is **clinical epidemiology** that studies disease outbreaks in the wild, essentially as a tool of treatment and of public health; and there is **theoretical, mathematical, or computational epidemiology** that explores the abstract processes that underlie disease propagation (and indeed many other propagation prcesses, which turn out to share a common mathematical form). The literature on mathematical epidemiology is enormous, and Hethcote [Het00] provides a good starting-point to the classical (non-networked) approaches. While the mathematical models are always going to be highly abstracted from the biology of diseases, it helps to understand the clinical side at least a little in order to get a handle on the processes involved and the ways in which they relate to real-world problems.

## Diseases¶

Everyone suffers from a disease at some point. The lucky amongst us avoid anything more serious that influenza, measles, or (in my case, long ago) whooping cough. But all diseases share some comon characteristics: characteristics so common, in fact, that their mathematical properties are shared by other processes that aren't actually diseases at all, including the spread of computer viruses [KW91] and the spread of rumours or other information.

The diseases in which we are interested are caused by *pathogens*, typically viruses or bacteria: simple living organisms that make their home in humans (or other living organisms) and cause some adverse reaction as a result of their lifecycle. These pathogens can pass between individuals in a number of ways, causing the disease to spread. A disease might be airborne, able to live in the air and be breathed by passing individuals. It might be deposited on objects and picked up by future physical contact with the contaminated surfaces. Or it might be communicable only by direct contact, skin to skin, through sex, or might require an even closer contact such as a blood transfusion. It might be foodborne, transmitted through contaminated food that infects several people from a common source. It might be vectored through an animal, as is the case for malaria. And finally there is a class of non-communicable diseases such as cancer or heart disease, typically not caused directly by pathogens but perhaps influenced by their presence.

Each different kind of disease will have its own characteristic pathology when it infects a person, and also its own epidemiology that controls how it spreads. Clinically, both these characteristics are extremely important. From a network science perspective we focus on the epidemiology, but the pathology remains important, because factors involved in how a disease progresses *in* individuals may have a profound effect on how it spreads *between* individuals.

## Disease progression¶

A person's infection goes through several periods. Once infected, the disease resides **latent** in their system, not showing symptoms and not being infectious to others. After this latent period the disease becomes infectious, capable of being spread to others: this is the **infectious** period. Typically a person's infectiousness peaks and dies away before the end of the disease progression.

These two periods – latent and infectious – control the *transmission* of infection. After initial infection, however, there is an **incubation** period before the person shows symptoms of the disease. After the onset of symptoms, the disease progresses and ends in some **resolution**: the patient gets better, or dies. If they recover, they may then have some immunity to further infection.

For different diseases, the lengths of these periods and the ways they overlap vary. For **Type A** diseases, the incubation period is longer than the latent period. This means that a patient can start to transmit the disease before the disease becomes manifest in themselves. This happens in cases of measles. In **Type B** diseases such as SARS or ebola, by contrast, the incubation period is shorter than the latent period, meaning that asymptomatic patients cannot infect others. So despite ebola being a more feared disease than measles, it may be easier to treat epidemiologically since quarantining patients showing symptoms will prevent transmission in the general population (although not to medical staff); in measles, transmission starts before symptoms show themselves, so quarantine based on symptoms is less effective. Moreover for some disease the infectious period may continue after the patient has died: the corpses of victims of ebola, which are transmitted *via* bodily fluids, can be extremely infectious for some time after death, meaning that funerals become very dangerous loci of potential infection for mourners.

How long do the different periods last? For each disease there will be typical durations, often with substantial variance. In the case of ebola, for example, a typical timeline would be a 0–3 day incubation period, a 7–12 day progression to recovery or death, and a latent period of 2–5 days. The ranges give the variance of periods, different for different individuals that depend on factors including the severity of infection and the individual's overall health. However, the incubation period for ebola can be up to 21 days, meaning that a suspected case needs to be quarantined for this period: long enough, in other words, for the disease symptoms to manifest if the person is actually infected. While one can test for most diseases (including ebola) in a laboratory, during an epidemic such tests may overwhelm the public health infrastructure, making quarantine the most practical option. (During historical disease outbreaks, of course, quarantine was the *only* option.)

## Measuring epidemics¶

Epidemiology consists of creating models of diseases and their spread than can be analysed, to make predictions or to simulate the effects of different responses. To do this, we need to identify the core elements of a disease from the perspective of transmission: we do not need to understand the disease's biology, only the timings and other factors that affect its spread.

We discussed above the periods of diseases, their relationships, and their different, often characteristics, periods and variances. We need some other numbers as well, however, and it turns out that these can be measured directly in the field.

The first important number is known as the **secondary attack rate**, denoted 2°AR. This measures what proportion of people exposed to each for each primary case will develop infection. For example in the first ebola outbreak 498 family members were exposed to infected relatives, of whom 38 developed ebola themselves. This gives a 2°AR of $38 / 498 = 7.6\%$.

There are a couple of caveats with this number. Firstly, 2°AR only applies to small populations, typically villages: it doesn't scale-up to whole populations. Secondly, 2°AR is very context-sensitive and depends on the degree of contact that individuals have with the infected individual: for close family members, the 2°AR in the first ebola outbreak was around 27% rather than around 8%. Thirdly, individuals may become increasingly infectious as their own infection progresses. Fourthly, disease organisms are subject to natural selection and, since they reproduce quickly, can vary in their infectiousness over time. The selection pressures often lead to diseases become progressively more transmissible but less severe, eventually become sufficiently weak for people's immune systems to be able to cope with them directly – and the epidemic dies out.

The second important metric is the **basic case reproduction number**, denoted $R_0$. $R_0$ represents the total number of secondary infections expected for each primary infection in a totally susceptible population. The $0$ in $R_0$ stands for $t = 0$: the basic case reproduction number applies at the start of an epidemic. Over time, the number of people susceptible to infection will vary as they become infected, recover, die, and so forth. We can account for this by developing a **net case reproduction number** $R_t$ by multiplying $R_0$ by the susceptible fraction of the population at time $t$.

$R_0$ is affected by three factors:

- The duration of infectiousness. All other things being equal, a disease with a longer period of infectiousness has more time in which to infect other patients.
- The probability of transmission at each contact. Some diseases are extremely contagious, with each contact having a high probability of passing on the infection; others are much harder to pass one to secondary cases.
- The rate of contact. Someone coming into contact with a lot of susceptible individuals will have more opportunities to generate a secondary case than someone meeting fewer people.

The first two factors are characteristic of the disease, derived from its biology. The third is affected by the social conditions in which the epidemic takes place: it is this factor that quarantine affects, by reducing (to zero) the contacts an infected person has with uninfected individuals.

These two metrics, 2°AR and $R_0$, are related. If we compute 2°AR for the different circumstances of contact – family, neighbours, community, and so forth – and multiply each by the number of contacts a person has in each circumstance, we can sum them to get $R_0$:

\begin{align} R_0 &= 2^oAR_{family} \times contacts_{family} \\ &+ 2^oAR_{neighbours} \times contacts_{neighbours} \\ &+ 2^oAR_{community} \times contacts_{community} \\ &+ \cdots \end{align}As with disease periods, $R_0$ is an average that is affected by various factors and is best treated as a mean value with some variance: some infected individuals will affect more people, some less, but *on average* they will generate $R_0$ secondary infections.

## Controlling epidemics¶

The importance of $R_0$ is that it indicates whether, and how fast, a disease can spread through a susceptible population. If $R_0 < 1$ then we expect fewer than one secondary case per primary. This means that each "generation" of the disese will be smaller than the one that infected it, and the disease will die out. If $R_0 = 1$ then the disease will perpetuate itself in whatever size of population was originally infected. Nature is never so precise as to present us with a disease like this, of course. However, $R_0 = 1$ is the *threshold value* about which diseases become epidemics. If $R_0 > 1$, the disease will break-out and infect more and more people exponentially quickly. Exactly *how* quickly depends on how large $R_0$ is. For measles, $R_0 \ge 15$ – fifteen new infections for each case – which explains how measles spread so quickly in unvaccinated populations. Different strains of influenza have $1.5 \le R_0 \le 3$. If this sounds benign, remember that the 1918 "Spanish flu" epidemic killed more people than the First World War. Perhaps interesting in light of its media coverage, for ebola we have $1.2 \le R_0 \le 1.8$, roughly the same as a not-too-severe winter flu outbreak.

In order to bring an epidemic under control, then, we need to reduce the *effective* value of $R_0$. This will generally happen anyway through mutation and selection as the disease agent (typically a bacterium or a virus) evolves. We can quarantine infected individuals so that they do not cause further secondary infections. This is clearly more effective with Type B diseases than with Type A, since the latter will have already become infectious before they become symptomatic.

A more effective approach is to reduce the susceptible fraction of the population. The approach is to make the value of $R_t$ less than one (the epidemic threshold) so that the disease dies out naturally. This happens naturally in diseases to which individuals have natural or acquired immunity, for example by surviving a bout of the disease.

An even more effective approach is to artificially give people immunity: vaccination. A vaccine, where it exists, confers full or partial immunity on its recipients. How many people do we need to vaccinate? If we have a disease with $R_0 = 3$, then to get the $R_t$ below the threshold we we would need to vaccinate $(3 - 1) / 3 = 67\%$ of the population. This value 67% is the **herd immunity threshold** for the disease: if we vaccinate above this fraction of the population, the effective value of $R_0$ should there be an outbreak will be less than one, and the outbreak will die out.

Vaccines of course are not 100% effective. We can define the **vaccine efficacy rate** $VE$ as:

where $rate_{unvaccinated}$ and $rate_{vaccinated}$ are respectively the number of people who catch the disease when (un)vaccinated. We factor this into the herd immunity threshold calculation from above: if $VE = 90\%$ then we need to vaccinate $0.67 / 0.9 = 74\%$ of the population to get herd immunity while accounting for the vaccine's limited efficacy.

## Inhomogeneity¶

The above discussion is based on one vital assumption, that of **homogeneous**, **uniform**, or **well mixing**. When we talk about rates of infection of the herd imunity threshold, we are implicitly talking about a very simple and even situation in which everyone potentially infects everyone else who is susceptible. This clearly isn't what happens in practice. Measles outbreaks, for example, typically rip through schools but not whole towns, because children are in closer proximity to other children that to the general population. (Using the terminology from above, the 2°AR for classmates is higher than for other classes of people.) The population is thus **inhomogeneous**: it has structure, and can't be treated as a single bloc.

There are other sources of inhomogeneity. Some people may meet far more people than others, which means that (if susceptible) they have more chances of becoming infected, and (if infected) have more opportunities to pass on the disease. Vaccinating or otherwise protecting such people preferentially might have more of an effect than trying to get herd immunity through largely random vaccination. In reducing sexually transmitted diseases, for example, the most efficient strategy is often to target sex workers, since they have disproportionately more sexual partners than average and are thus more powerful in driving the spread of any diseases they pick up.

There will also be people who for whatever reason are not, or cannot be, vaccinated: perhaps they have an allergy to the vaccine, or are too young for it, or too old, or have some objection to taking it. Herd immunity protects these groups by reducing the chances that an infection starting somewhere in the population will spread as far as them and get the opportunity to infect them. Sometimes one finds clusters of unvaccinated people (retirement homes, for example) in which disease outbreaks are unusually severe regardless of whether the population as a whole has herd immunity. This is a direct consequence of inhomogeneous mixing, with an infected person being far more likely that average to encounter an unvaccinated individual in such a setting.

As a general rule, "small" populations can safely be treated as homogeneous, while "large" populations are inhomogeneous. What exactly constitutes "small" and "large" is a tricky question.

## Chains of infection and contact tracing¶

What does a spreading epidemic look like? If we begin with the first infected person – **case zero** of the epidemic – then that person infects some number $R_0$ of individuals (with some variance). Each of those people then causes $R_0$ tertiary infections, and so on: an exponential increase in infected individuals.

Like all exponential processes in nature, this one does not go on for ever. What "fuels" the epidemic is the availability of susceptible individuals, and eventually the supply runs short, there are no susceptible people to infect, $R_t$ drops below one, and the epidemic dies out. In a homogeneous population, shortage of susceptibles is a *global* phenomenon: the proportion of susceptible people drops as a whole. In an inhomogeneous population, however, it may be a *local* phenomenon: there may be lots of susceptible people in the population as a whole, but if there are none locally to be infected, the disease can still die out.

Each infected person is infected by *someone*, and only one person. If we were to look at the way the infection spreads, the infected cases would form a tree whose root was case zero and whose levels were formed by secondary cases, tertiary cases, and so on. There is a **chain of infection** from each infected person back to case zero.

In clinical epidemiology, constructing these chains of infection is the process of **contact tracing**. When a new case is discovered (or suspected), public health workers try to track down everyone that person has met since they became infectious and test them to see of they are infected: if so, they can be treated, and their own contacts traced. The chains of infection form a tree, with the case-zero individual at the root.

## References¶

[KW91] Jeffrey Kephart and Steve White. Directed-graph epidemiological models of computer viruses. In Proceedings of Research in Security and Privacy, pages 343–359. IEEE Press. Oakland, CA. May 1991.

[Het00] Herbert Hethcote. The mathematics of infectious diseases. SIAM Review **42**(4), pages 599–653. December 2000.