# Models and modelling¶

We should first clarify what is means to model something, or to develop a model: what models are, what we expect from them, their advantages and limitations.

## Models¶

By **model** we mean a formal description of some aspects of
a system of interest that we can explore in order to gain insight into
the behaviour of the real system.

A **mathematical model** consists of
one or more equations expressing the relationships between different
quantities. There are often some **parameters** involved, quantities
whose values are known or assumed.

A **computational model**,
by contrast, is a program written to simulate the
behaviour of the system. Such simulations are almost always based on
underlying mathematical models and include parameters. What do
computers provide? Sets of equations can often be understood (or
“solved analytically”) by purely symbolic means, but many systems of
equations *can’t* be solved this way and instead need to be solved
numerically, by starting with specific values (numbers) and showing
how they evolve under the equations. Even for equations that *can* be
solved analytically, computers are often useful tools for helping to
explore large systems, or for visualising the results.

## Uses of models¶

There are lots of questions we might want to ask about systems, and these can often give rise to different models drawing on different styles and approaches to modelling. For definiteness, let’s discuss some of the questions we might ask about epidemics.

We might be interested in **epidemics in general**. How do changes in
infectiousness affect the spread of the disease? What are the
relationships between infectiousness and recovery? How do different
patterns of contacts in a population affect how it spreads? What are
the effects of different countermeasures, like physical distancing,
vaccination, or quarantine? Are there any patterns in the epidemic,
like multiple waves? These are quite abstract questions that could be
asked of *any* disease, and answering them might tell us something
about how *all* diseases behave – including those we haven’t encountered yet.

On the other hand, we might be interested in **a specific disease**, or
even in **a specific outbreak**. How will *this* disease spread in a
population? How about in another country to the one it’s currently in?
How will a *particular* countermeasure affect the spread? When will it
be safe for the majority of people to return to work? These are all
very concrete questions that depend on the exact details of the
situation about which they’re asked, and answering them may be
massively useful in managing this situation. (Taylor provides an
accessible discussion of the uses and interpretations of the well-known
Imperial College model of covid-19’s impact on UK NHS bed availability
during the 2020 outbreak [24].)

The interplay between these two kinds of questions is quite complicated. In concrete cases we presumably measure the specifics of the outbreak and work with them. We only have partial control, for example on enforcing social distancing. It’s often hard to then make more general predictions about diseases more widely, to draw conclusions that can be used in other cases.

So should we be more abstract? Abstraction
typically brings control over the model: we can explore a whole range
of modes of transmission, for example rather than just the one we
happen to have for this disease. We can explore different
countermeasures in the model without committing to one, which means
there are no consequences for being wrong. We get to observe some
general patterns and draw general conclusions – which then don’t
*exactly* apply to *any* real disease.

On the other hand, the conclusions we draw from these abstract models
can’t be applied blindly to particular situations on the ground. A
good example (which we’ll come to later) concerns
the conditions under which an epidemic can get established in a
population. One would want to be *very* careful in taking the results
of an abstract investigation of this phenomenon and then concluding
that an epidemic can’t occur in a specific population – very
careful that the model’s assumptions were respected, very careful that
the parameters were known, and so forth. Mistakes in situations like
this can mean that outbreaks get out of control, and people may die.

## Assumptions¶

The accuracy of a model depends on its **assumptions**,
and how well these match reality. This issue appears
in several guises. The model’s “mechanics” – the ways it fits
its various elements together – need to match the disease it’s
(purporting to be) a model of. It needs to identify the parameters
that control its evolution. These parameters need to match those of
the real disease.

All these are problematic at the best of times, but especially when dealing with a new disease that’s not been well-studied. How infectious is a disease? How long is a person sick for? Does the disease confer immunity on an individual who’s had it? – and is that immunity total or partial, permanent or time-limited? All these factors introduce uncertainties into any conclusions we draw from modelling.

## Correctness¶

Whether we’re interested in concrete or abstract questions, we still
have the problem of **correctness**: does our model produce the
“right” answers? It might not, either because it has been built
incorrectly (has “bugs” in computer terms – but mathematical
equations have them too), or because we don’t know the values of some
of the parameters (especially problematic in the middle of an
outbreak, when measurement often takes a back seat to treatment), or
because there are aspects of the real world that we haven’t considered
but that affect the result (often the case for more abstract models).

We know quite a lot about building software, much of which applies to the building of computer models: unit testing, integration testing, clear documentation, source version control, and so on. With modelling we then face the additional problem of deciding whether the code we’ve built is fit for purpose.

Deciding what “fit” means is an interesting question in its own
right. It’s something we may only know retrospectively: did the
results that came out of the model match what happened on the ground?
We may not be able to measure exactly what *did* happen on the ground:
did we count all the fatalities, or were some missed, or
mis-diagnosed? For a more abstract model, how happy are we that our
simplifications don’t entirely divorce us from reality?

## Stochastic processes¶

There’s another problem.

Suppose you have the misfortune of becoming ill. For a fortnight you
are infectious, and there’s a chance that you’ll infect anyone you
meet. Now we know that you don’t infect *everyone* – no disease
(fortunately) is so contagious – so you infect a fraction of all
those you *could* have infected. We can’t usually predict this
exactly, but the exact details may matter: rather than infect Aunt
Carol, who’s a noted recluse who has no further opportunities to
infect anyone else, you instead happen to infect Cousin Charlotte
who’s a noted party animal and goes on to spread the illness
widely. So even if we know the general pattern of a disease, the exact
way in spreads is affected by chance factors.

A system like this is referred to as a **stochastic process**. They
include an element of randomness in their very nature: it’s *not* a
bug, it’s a feature.

Now consider what this means for modelling. We can take exactly the
same situation – the same disease, the same population –
run the model twice, get two different answers – and them *both*
be right! The way to think about this is that each run is a “possible
outcome” of the disease. There may be several possible outcomes, and
they may all be similar – or there may be radical
differences. (We’ll see an example of this in a later chapter.)

We often think that every problem has a “right” answer, but for stochastic processes this isn’t the case: there are many “right” answers. It’s attractive to think that we can simply “debug” our way out of trouble, but in fact we can’t. There may be randomness we can’t engineer away.

What to do? Actually, computer science is unusual in “normally” having
single answers. If you ask a biologist how long butterflies live for,
you don’t expect her to go out and observed the lifespan of *every
single butterfly* before answering. Instead you get a statistical
answer: an average and some variance. It’s the same for stochastic
processes (or models): we run the model several times (possibly
hundreds of times) for the same inputs, and collate the results.

In a computer model, it’s often possible to actually reproduce exactly
even a stochastic process, because the “random numbers” we use are
actually only pseudo-random and so can be re-created. That can help in
the narrow sense of seeing whether the model produces the same results
given the same inputs *and* the same “random” numbers,
but it doesn’t help in the wider sense of capturing the behaviour of a system with
*inherent* randomness.

## Managing our expectations¶

This all sounds like modelling is a horrible mess. But the situation isn’t hopeless. We just need to be careful.

The results we get from any model, of any kind,
are tentative and suggestive and can generate insight into the system
the model is seeking to represent, whether concretely or
abstractly. There will always be factors outwith the model’s
consideration. The results aren’t “true” in any exact sense. They need
to be interpreted by people who understand both
the phenomena *and* models *and* modelling. This will often lead to
the realisation that the model needs to be changed, or extended or
enriched, and sometimes even simplified and stripped-down, better to
answer the questions that are being posed.

When we quipped in the preface that “scientists don’t really do certainty”, it’s this that we had in mind.

Science is sometimes criticised for pretending to explain everything, for thinking that it has an answer to every question. It’s a curious accusation. As every researcher working in every laboratory throughout the world knows, doing science means coming up hard against the limits of your ignorance on a daily basis – the innumerable things that you don’t know and can’t do. This is quite different from claiming to know everything. … But if we are certain of nothing, how can we possibly rely on what science tells us? The answer is simple. Science is not reliable because it provides certainty. It is reliable because it provides us with the best answers we have at present. Science is the most we know so far about the problems confronting us. It is precisely its openness, the fact that it constantly calls current knowledge into question, which guarantees that the answers it offers are the best so far available: if you find better answers, those new answers become science. … The answers given by science are not reliable because they are definitive. They are reliable because they are not definitive. They are reliable because they are the best answers available today. And they are the best we have because we don’t consider them to be definitive, but see them as open to improvement. It’s the awareness of our ignorance that gives science its reliability.

—Carlo Rovelli [18].

Modelling, like experimentation, is both integral to science and subject to it: both a tool and an object of study, to be approached sceptically and refined through time. The study of epidemics is an excellent example of this process, and we can progressively refine our models better to reflect our improving understanding.

## Questions for discussion¶

What can models tell us about real-world disease epidemics?

Suppose you were asked to advise political leaders on the basis of what a model predicts. Would you? What would you want them to know about the process of modelling?