Epidemic simulation

Now we have a model, we need a simulator on which to run it. In this chapter we’ll introduce the ideas behind simulating processes over networks, in particular the idea of discrete-event simulation. We’ll then – finally! – be ready to look at the main ways in which network science simulations are coded, and to run an epidemic model over a network.

Some notes about simulation

Simulation is an enormous topic in computer science, with a long and distiinguished history. It’s easy to see why it’s so important: whenever we use computers to study natural processes (or indeed man-made or engineering processes) we’re taking a physical system, abstracting it into a computer model, and then building software that runs the model as if it were the real system running in the real world – and then draw conclusions about how the real system will behave when subjected to the analagous real-world situation.

The process of model abstraction – of which compartmented models of disease are a prime example – is a process of simplification, of leaving out details in order to get to the essentials of the process we’re interested in. This reduction of detail is sometimes criticised by those outside the scientific community: if you leave out the details, how do you know that your model is really saying anything about the real-world phenomenon? And that’s a fair point. But simplification is essential if we’re to understand the core behaviour of processes and not be distracted by all the details.

Model quality

How do we know is a model says anything meaningful? We need to verify and validate it – two software engineering concepts that are sometimes summarised as “did we build it right?” and, “did we build the right thing?”.

By verification we mean examining the model to ensure that, to the best of our ability, the mathematics and code are faithful to the way we think the process operates. This examination might take any number of forms, from inspection of the code and maths by others, through the development of test suites to exercise the code and check it against known situations, to the use of the mathematically-based techniques of computer science formal methods. It’s easy to get code wrong, and incorrect code tells us less than nothing about the phenomena we’re interested in: never skimp on debugging, and never assume things are finally working completely correctly.

By validation we mean deciding whether the model does actually reflect the real world. This might take the form of creating a simple real-world experiment and performing it physically as well as in simulation, to see if the results match. Of course they never will match exactly, because in the process of simplification we’ll have removed some of the details that affect the physical process. In simulation, a pendulum on a friction-free mount will swing forever; in reality, it never will, because the mount will never actually be friction-free.

Discrete-event simulation

Since simulation has so much history, it’s unsurprising that there are a myriad of approaches to conducting them. Each choice has subtly different implications on the experiments and results obtained – often only really understood by those who’ve spent a lifetime with the given techniques.

In network science the simulations we typically use fall under the broad rubric of discrete-event simulation. What this means is that, to a simulator, the world is treated as a sequence of individually-identifiable events that happen in a sequence through time. In the case of disease models, the events are individual nodes being infected or recovering: individual, discrete “happenings” described individually and executed independently. Of course one event affects subsequent ones – you can’t recover if you’ve never been infected in the first place – but that’s about the possible sequences of events that can occur, not a relationship between one event and enother at the coding level.

You can see the sequencing of events at work in the compartmented model. An infection event happens at SI edges and has a local effect: change the susceptible node’s compartment, which in turn might generate more SI edges at which further infection events can occur. A recovery event happens at infected nodes, meaning it can only happen if an infection event previously happened there (or of the node was initially infected). The sequencing is implicit in the definition of the event loci, and of the events’ effects – even though there’s no explicit encoding within the events themselves of how they’ll be sequenced.

Simulation time

A simulation occurs in simulated time, which is to say the time in the simulated world. This is typically different to real-world or wallclock time, which is how long the simulation takes to run on a computer. These two notions of time differ substantially. It’s easy to see why: most biological and physical processes take an eternity from the perspective of a modern computer. The progression of a disease in an individual might take days, and we seldom want to wait that long for results. Simulation time often therefore passes more quickly than wallclock time.

We might need some way to relate simulation time to “real world” time, for example to see how many days an epidemic will last. In that case we’ll need to develop ways to translate between simulated time and the “real” time of the phenomenon being studied. But often we don’t care about this level of realism, and are happy to work in a more abstract world.

There’s still another thing to consider, which is the issue of temporal resolution. Time, at least at the macro scale, is a continuous quantity, represented as a real number. A continuous time simulation represents time in this way, and also typically assuems that only one event happens at each (simulated) moment. This may sound restrictive, but the idea is that the events that happen happen instantaneously, so two events never need happen at exactly the same time: we can always put some infinmitesimal gap between them. In SIR, this means people go from being susceptible to being infected instantaneously; if two people are infected, one of them is always infected before the other.

Another way to view time is to think of it as divided up into discrete chunks: seconds, for example. Instead of modelling a continuous stream of events, each occurring at a different instant, we think about blocks of time in which a set of events occur. This is a discrete time simulation.

Which of these approaches is “right”? Neither – and that’s anyway the wrong question. They are both approximations of reality that we use to perform computational experiments. There are sometimes reasons to prefer one over the other, but often the choice is a matter of intellectual preference or coding convenience. At the risk of massive stereotyping, people with computer science backgrounds are often (at least initially) more comfortable with a discrete-time view, while people with a classical science background often find it easier to think about continuous time. (One reason for this may be that the mathematics taught in computer science programmes is typically overwhelmingly discrete and tends not to emphasise modelling with differential equations, which is where the continuous ideas come from.) There are good mathematical reasons to prefer continuous-time over discrete-time simulation, but both are available to you.

Stochasticity, or random factors

When working with random networks and stochastic processes there are additional complications due to the use of randomness. It is entirely possible that, just by a chance interaction, a disease on a network will die out. Run the same experiment on the same network with the same parameters – and you might get a disease that doesn’t die out, because the chance interaction didn’t happen this time.

Does this mean that such experiments aren’t repeatable? No! – but it does mean that we need to be careful, perform repetitions, be sure that we understand the implications of the various random factors that affect the outcome of each experiment. We’ll have a lot more to say on this topic later.

What this discussion is getting at is that we need to be careful in going from the models we develop, their realisation in code, and their execution in simulation, to conclusions about the real world. We need to be sure that the conclusions we draw are supported by the simulations we’ve done, and that they match, to an appropriate degree, observations we can make about the real-world process we’re simulating.

The process and coding of simulation

Let’s look in overview at the process of discrete-event simulation, before we get into the coding details.

The basic process of simulation involves repeatedly deciding three things:

  1. when (in simulation time) does the next event occur?,

  2. where in the network does it occur?, and

  3. which event is it that occurs?

The event is then executed, and the process repeats – forever in principle, and in practice until some termination condition occurs. In network science we often use a termination condition of equilibrium, where the network has in some sense “stabilised” so we can look at its overall state. In SIR this might be when there are no infected nodes left in the network, since no further events are then possible.

How are these three decisions made? The details are what differentiates between the different methods of simulation. For our purposes, epydemic provides a framework for simulating epidemics on networks, with the decision-making either being coded directly or – more conveniently – being offloaded to a software encoding of a compartmented model. It’s this framework we’ll turn to next.

The structure of epydemic simulations

What is the appropriate software structure for a simulator? There probably is no right answer to this, but epydemic views simulations as being composed of two distinct parts:

  1. Processes that define how the process being simulated behaves in time; and

  2. Dynamics that define how the model is used to produce and execute the stream of events making up the simulated process.

A process defines methods for setting-up, running, and tearing-down the process for each experiment we run; for defining the rates at which different events occur; for the code that runs at each event; for collecting the results of the experiment; and for accessing and changing the network over which the process runs. There are all sorts of processes we can define, including the compartmented model process that executes compartmented disease models. (We’ll come onto these later.)

A dynamics by contrast determines which event runs next. This is the main division of labour between processes and models: a model defines what events do, while the dynamics defines what events run and in what order. The bridge between these two responsibilities is provided by the statistical functions on the process API that return the rates at which events should occur. The dynamics knows nothing about how these rates are computed: they will typically change as a result of events being fired; similarly, it knows nothing about what the events do.

There are two basic kinds of dynamics. Synchronous dynamics works in discrete time, and at each timestep decides whether each possible event occurs or not. Stochastic or Gillespie dynamics works in continuous time and is bboth more efficient and more statistically exact, at a cost of being perhaps less intuitive. Let’s look at the synchronous case first, even though it’s less commonly used in practice, and then tackle the stochastic case.