Home » cncp » Synchronous simulation

Synchronous simulation

The simplest way to simulate an epidemic from a computing perspective is to adopt a discrete time model. We tie this together with our compartmented models from earlier to create a simulator for SIR.

(This is a chapter from Complex networks, complex processes.)

Synchronous simulation

We now have all the bits we need to build an epidemic simulation. In this chapter we'll develop a simulation of SIR in discrete time – that is to say, modelling the world at the granularity of some interval of time in which lots of independent events can occur. We'll show how this simulation is put together using the epydemic library, making use of the compartmented model we coded earlier. And we'll discuss some of the advantages of this approach – but also its limitations, whcih lead us into continuous-time suiimulation of the same model

Basic discrete-time simulation

Recall from our earlier discussion that discrete-event simulators have to make three key decisions:

  1. when (in simulation time) does the next event occur?,
  2. where in the network does it occur?, and
  3. which event is it that occurs?

A discrete-time simulation performs these decisions in a simulation loop that looks roughly as follows. At each timestep, the simulation collects all the places in which an event might occur (the "where" question). It then, for each of these places, decides whether the event occurs or not ("when") and, if it decides that it does, executes the event ("which"). It then moves to the next moment and repeats. Executing an event will typically change the places where future events can occur.

A discrete-time simulation is sometimes referred to as a synchronous simulation, because all the events in a given moment are performed in a batch.

Epidemic simulation in discrete time

Let's now build the code we need to create a synchronous simulation of an epidemic. We'll be making use of the epydemic library, and specifically its descriptions of compartmented disease models. Before we do that, however, we need to construct a general simulation framework that we can then specialise to perform the functions we need.

epydemic represents synchronous simulation using a small class hierarchy, and in this chapter we'll fill-out the part outlined in red in the following UML diagram:

(Actually what we'll describe is a slightly simpler version of epydemic for ease of explanation. But it captures all the main points, and we'll come back to the code when we need the more advanced features.)

The decomposition of the three classes is as follows. epydemic.Dynamics defines the basic functionality of a discrete-event simulation, mainly concerning the way we get events to execute. epydemic.SynchronousDynamics specialises this framework to run in synchronous time, collecting together all the events for a given timestep, but without specifying exactly where the events come from. epydemic.CompartmentedSynchronousDynamics then binds the source of events to a compartmented model. (We describe why we do it this way below.)

In [1]:
import networkx
import epydemic
import epyc
import math
import numpy
import pickle
from copy import copy

import pandas as pd

import matplotlib
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import matplotlib.pyplot as plt
import seaborn
/Users/sd/programming/cncp/cncp/lib/python2.7/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))

Discrete-event simulation dynamics

Let's begin with the basic discrete-event dynamics:

In [2]:
class Dynamics(epyc.Experiment, object):
    '''A dynamical process over a network. This is the abstract base class
    for implementing different kinds of dynamics as computational experiments
    suitable for running under. Sub-classes provide synchronous and stochastic
    (Gillespie) simulation dynamics.'''

    # Additional metadata elements
    TIME = 'simulation_time'      #: Metadata element holding the logical simulation end-time.
    EVENTS = 'simulation_events'  #: Metadata element holding the number of events that happened.

    # the default maximum simulation time
    DEFAULT_MAX_TIME = 20000      #: Default maximum simulation time.
    def __init__( self, g = None ):
        '''Create a dynamics, optionally initialised to run on the given network.
        The network (if provided) is treated as a prototype that is copied before
        each individual simulation experiment.
        :param g: prototype network (optional)'''
        super(Dynamics, self).__init__()
        self._graphPrototype = g                 # prototype copied for each run
        self._graph = None                       # working copy of prototype
        self._maxTime = self.DEFAULT_MAX_TIME    # time allowed until equilibrium

    def network( self ):
        '''Return the network this dynamics is running over.

        :returns: the network'''
        return self._graph

    def setNetworkPrototype( self, g ):
        '''Set the network the dynamics will run over. This will be
        copied for each run of an individual experiment.

        :param g: the network'''
        self._graphPrototype = g

    def setMaximumTime( self, t ):
        '''Set the maximum default simulation time. The default is given
        by :attr:`DEFAULT_MAX_TIME`.

        param: t: the maximum time'''
        self._maxTime = t
    def at_equilibrium( self, t ):
        '''Test whether the model is an equilibrium. Override this method to provide
        alternative and/or faster simulations.
        :param t: the current simulation timestep
        :returns: True if we're done'''
        return (t >= self._maxTime)

    def setUp( self, params ): 
        '''Before each experiment, create a working copy of the prototype network.

        :param params: parameters of the experiment'''

        # perform the default setup
        super(Dynamics, self).setUp(params)

        # make a copy of the network prototype
        self._graph = self._graphPrototype.copy()

    def tearDown( self ):
        '''At the end of each experiment, throw away the copy.'''

        # perform the default tear-down
        super(Dynamics, self).tearDown()

        # throw away the worked-on model
        self._graph = None

    def eventDistribution( self, t ):
        '''Return the event distribution, a sequence of (l, p, f) triples
        where l is the :term:`locus` of the event, p is the probability of an
        event occurring, and f is the :term:`event function` called to make it
        happen. This method must be overridden in sub-classes.
        It is perfectly fine for an event to have a zero probability.

        :param t: current time
        :returns: the event distribution'''
        raise NotYetImplemented('eventDistribution()')               

We make the dynamics class a sub-class of epyc.Experiment. We haven't discussed epyc yet – and there's no need to right now – but it provides functions for running lots of repetitions of simulations with a single command. We'll make extensive use of this later when we scale-up simulations.

An epidemic simulation takes place over a network. We can provide a network either to the constructor or by calling setNetworkPrototpye(). This network is referred to as the prototype network. Every time we run the simulation, the prototype is copied into a working netyork that we then run the epidemic process over. This means we can repeatedly use the same network for different instances of the same process. The setUp() and tearDown() methods create and destroy the working copy.

We need to know when we should stop the simulation, and the most general answer to this is to have a maximum simulation time: that way we know we'll stop at some point. setMaximumTime() can be used to change this from the default value of 20000 timesteps; at_equilibrium() returns true if we have exceeded that time. Clearly we will often be able to do a better job of decoding whether a simulation has ended, in which case we should override this method.

Finally, we need a source of events. We get these in terms of a probability distribution that consists of a list of triples consisting of a list of places for an event to occur in the network, the probability of that event happening at any given element place, and the event function that we call when the event occurs. The eventDistribution() method returns the distribution for the given time, and for the moment is left undefined.

We should note what else this class doesn't provide: any way of actually selecting and executing events drawn from the distribution. For that we need to define a specific dynamics.

Synchronous dynamics

Any simulation dynamics has to answer the three questions we posed earlier: when does an event happen?, where in the network?, and which action is taken? Synchronous dynamics has simple answers to these questions. At each discrete timestep (when) it looks for all the places in the network where an event could occur (where), and choose whether or not an event occurs at each place according to the probabilities given to the events by the probability distribution (which).

Providing this dynamics is simply a matter of turning this into code:

In [3]:
class SynchronousDynamics(Dynamics):
    '''A dynamics that runs synchronously in discrete time, applying local
    rules to each node in the network. These are simple to understand and
    simple to code for many cases, but can be statistically inexact and slow
    for large systems.'''

    # additional metadata
    TIMESTEPS_WITH_EVENTS = 'timesteps_with_events'  #: Metadata element holding the number timesteps that actually had events occur within them

    def __init__( self, g = None ):
        '''Create a dynamics, optionally initialised to run on the given prototype
        :param g: prototype network to run over (optional)'''
        super(SynchronousDynamics, self).__init__(g)

    def do( self, params ):
        '''Synchronous dynamics.
        :param params: the parameters of the simulation
        :returns: a dict of experimental results'''
        # run the dynamics
        g = self.network()
        t = 0
        events = 0
        timestepEvents = 0
        while not self.at_equilibrium(t):            
            # retrieve all the events, their loci, probabilities, and event functions
            dist = self.eventDistribution(t)

            # run through all the events in the distribution
            nev = 0
            for (l, p, ef) in dist:
                if p > 0.0:
                    # run through every possible element on which this event may occur
                    for e in copy(l.elements()):
                        # test for occurrance of the event on this element
                        if numpy.random.random() <= p:
                            # yes, perform the event
                            ef(self, t, g, e)
                            # update the event count
                            nev = nev + 1

            # add the events to the count
            events = events + nev
            if nev > 0:
                # we had events happen in this timestep
                timestepEvents = timestepEvents + 1

            # advance to the next timestep
            t = t + 1

        # add some more metadata
        (self.metadata())[self.TIME] = t
        (self.metadata())[self.EVENTS] = events
        (self.metadata())[self.TIMESTEPS_WITH_EVENTS] = timestepEvents

        # report results
        rc = self.experimentalResults()
        return rc

That's it! – one method called do() that codes-up the simulation loop. While the simulation is not at equilibrium (as defined by the at_equilibrium() method inherited from Dynamics) we retrieve the event distribution. For each entry we run-through all the possible places for an event and select randomly whether the event actually happens. We do this by using the numpy.random.random() function, which returns a random number uniformyl distributed over the range $[0, 1]$. If this random number is less than the probability associated with the event, then we "fire" the event by calling the associated event function, passing it the dynamics, the current simulation time, the network over which the process is running, and the place where the event occurs (a node or an edge in the network). We keep track of the number of events we fire, and also keep track of the number of timesteps in which events are fired, which we'll use later when we think about the efficiency of this kind of simulation.

At the end of do() we package-up a short summary of the experiment as metadata: data about the way the simulation occured. We store this in a dict that we inherit from epyc.Experiment, accessed by the metadata() method. Finally we return our experimentalResults(), which is another method inherited from epyc.Experiment that we'll come back to in a moment.

We're still missing some details, though: SynchronousDynamics doesn't give us an event distribution, and doesn't give us any events.

Compartmented synchronous dynamics

The final step, then, is to tie the synchronous dynamics to a compartmented model. We simply delegate the methods we need implementations for off to the implementation we looked at earlier:

In [4]:
class CompartmentedSynchronousDynamics(SynchronousDynamics):
    '''A :term:`synchronous dynamics` running a compartmented model. The
    behaviour of the simulation is completely described within the model
    rather than here.'''
    def __init__( self, m, g = None ):
        '''Create a dynamics over the given disease model, optionally
        initialised to run on the given prototype network.
        :param m: the model
        :param g: prototype network to run over (optional)'''
        super(CompartmentedSynchronousDynamics, self).__init__(g)
        self._model = m

    def setUp( self, params ):
        '''Set up the experiment for a run. This performs the default action
        of copying the prototype network and then builds the model and
        uses it to initialise the nodes into the various compartments
        according to the parameters.

        :params params: the experimental parameters'''
        # perform the default setup
        super(CompartmentedSynchronousDynamics, self).setUp(params)

        # build the model

        # initialise the network from the model
        g = self.network()
        self._model.setUp(self, g, params)

    def eventDistribution( self, t ):
        '''Return the model's event distribution.

        :param t: current time
        :returns: the event distribution'''
        return self._model.eventDistribution(t)

    def experimentalResults( self ):
        '''Report the model's experimental results.

        :returns: the results as seen by the model'''
        return self._model.results(self.network())

Again, a very simple class. The constructor takes a compartmented model as a parameter. The setUp() method does the standard behaviour of building a copy of the network prototype, and then resets and builds the model and passes the working network to the model's setUp() method. eventDistribution() returns what the model says is the event distribution, which will also include implementations of the events. Finally, experimentalResults() returns a dict of the model's definition of what constitutes the important features of running that particular model.

But why is the code decomposed this way?

That's quite a lot of code, so let's pause and assess what we've built.

First of all we defined the basic structures of an epidemic process on a network: basically the ability to generate a working copy of a network several times, some definition of termination, and an abstract method for getting the event distribution. We then specialised this to provide continuous-time simulation dynamics which takk the distribution and applied it to all possible places where events could occur according totheir probabilities. Rather than then specifying the event distributions and events by sub-classing, we instead bound the missing elements to an object defining a compartmented model of disease, allowing that to provide the details.

Why this way? – why not just sub-class SynchronousDynamics to provide, for example, the events of SIR and their distribution? The answer is that SIR is a process that can run on several different simulation regimes as well as this one, notably the stochastic dynamics we'll look at later. If we defined SIR by sub-classing SynchronousDynamics, we'd then need to re-define it if we introduced another simulation dynamics: two definitions of the same process, which is an invitation to mistakes.

It's far better to define a single process in a single class and then re-use it, wnd this is what we've done in defining the CompartmentedModel class and sub-classing it to define SIR. This makes the simulation framework easier to use, but trickier to implement: the astute reader will have noticed that we didn't explain how CompartmentedModel works inside, and that's because it's a bit complicated. But it's also largely irrelevant in practice: you don't need to know how this particular piece of code works in order to use it for network science experiments. (If you're interested, you can look at the code in epydemic's github repo. But don't say you weren't warned.)

The message here is not that some simulation code is complicated, but rather that it's possible to localise that complexity where it can't do any harm. This keeps the user interface simpler and also means that we can now concentrate on the epidemics, not the code we use to simulate them.

Simulating SIR

Finally, at long last, let's run some code.

We have a compartmented model of SIR, and a synchronous discrete-time simulation framework, so let's run the former in the latter. We first need to define the parameters of our simulation, and for this experiment we'll use a small-ish ER network and some fairly nondescript SIR parameters:

In [5]:
# ER network parameters
N = 5000
kmean = 5
pEdge = (kmean + 0.0) / N

# SIR parameters
pInfected = 0.01
pInfect = 0.2
pRemove = 0.1

We can then create the network and the model, and bind them together with the simulation dynamics:

In [6]:
g = networkx.erdos_renyi_graph(N, pEdge)
m = epydemic.SIR()
sim = CompartmentedSynchronousDynamics(m, g)

To run the simulation we need to pass the parameters as a dict keyed by the names we used for them in the compartmented model description of SIR. We then set these as the parameters and run the simulation:

In [7]:
# create a parameters dict containing the disease parameters we want
params = dict()
params[epydemic.SIR.P_INFECTED] = pInfected
params[epydemic.SIR.P_INFECT] = pInfect
params[epydemic.SIR.P_REMOVE] = pRemove

# run the simulation
sync = sim.set(params).run()

# save the results for later
with open('sync.pickle', 'wb') as handle:
    pickle.dump(sync, handle)

The simulation returns a Python dict which as refer to as a results dict. It's structured in a very particular way, with three top-level keys:

In [8]:
['results', 'parameters', 'metadata']

The results key contains a dict of the experimental results that the simulation returned: it's "real" results, if you like:

In [9]:
{'compartments': {'I': 0, 'R': 4773, 'S': 227}, 'loci': {'I': 0, 'SI': 0}}

In this case the results are a dict of compartments and their sizes, and a dict of loci and their sizes. We can see that in this case there are no infected nodes left, and therefore no SI edges – and therefore no way the simulation can infect any more nodes.

The parameters key contains a dict of the parameters we passed to the simulation:

In [10]:
{'pInfect': 0.2, 'pInfected': 0.01, 'pRemove': 0.1}

So we have the experimental results and the simulation parameters that gave rise to the immediately to hand. Note that this isn't quite all the information we might need, as it doesn't include the size or link probability of the underlying network prototype we passed to the simulation.

Finally, the metadata key contains a dist of useful information about how the simulation progressed:

In [11]:
{'elapsed_time': 0.46164,
 'end_time': datetime.datetime(2017, 8, 16, 18, 29, 44, 509614),
 'experiment_time': 0.221504,
 'setup_time': 0.236137,
 'simulation_events': 9961,
 'simulation_time': 20000,
 'start_time': datetime.datetime(2017, 8, 16, 18, 29, 44, 47974),
 'status': True,
 'teardown_time': 0.003999,
 'timesteps_with_events': 74}

These values might be important in assessing how the simulation worked. For the time being, let's just draw attention to the difference between two values: the overall simulation time (20000 timesteps, the default), and the number of timesteps iun whichevents actually occurred. The former is way larger than the latter, suggesting that the simulation did an awful lot of ... well, nothing.

We can easily check whether we had an epidemic by checking the size of the largest outbreak, which in the case of an epidemic should scale linearly with N, the size of the network:

In [12]:
print "Epidemic covered {percent:.2f}% of the network".format(percent = ((sync['results']['compartments']['R'] + 0.0)/ N) * 100)
Epidemic covered 95.46% of the network

Issues with the synchronous approach

The synchronous approach to simulation is easy for a programmer to understand, which of course is an enormous advantage. But it does have some important disadvantages too. There are two issues in particular, one programmatic and one mathematical, that can be better addressed by using a slightly different approach.

Performance of synchronous simulation

The synchronous synamics we encoded above works by evaluating the process dynamics at each discrete timestep. This is an obvious approach, but one that begs two questions: how expensive is it to evaluate the dynamics at each step?; and, what proportion of timesteps do we evaluate the dynamics with no effect, because nothing changes?

To answer the first question we can look at the do() method on SynchronousDynamics. At each timestep it retrieves all the places where an event might occur, which we know from our definition of SIR is any SI edge (for infection events) and any infected node (for removal events). For each place, it draws a random number and then possibly calls an event function. The amount of work therefore depends on the sizes of the two loci for events, which will presumably swell as the epidemic progresses: we might assume that in an average timestep about half the nodes are infected, and some smaller proportion of the edges are SI: we can't say much more without a lot more information about the structure of the network. The loci change as events occur, which means that CompartmentedModel will have to ensure that it can efficiently track these changes (and indeed a lot of the code complexity addresses exactly this).

We alluded earlier to the answer to the second question. The result dict includes metadata that defines the number of timesteps and the number in which at least one event actually occurred. We can use these to determine the percentage of timesteps in which anything actually happened – and therefore calculate the "wasted" timesteps:

In [13]:
print "Of {n} cycles simulated, {no} ({percent:.2f}%) had no events ".format(n = sync['metadata']['simulation_time'],
                                                                             no = sync['metadata']['simulation_time'] - sync['metadata']['timesteps_with_events'],
                                                                             percent = (sync['metadata']['simulation_time'] - sync['metadata']['timesteps_with_events']) / (0.0 + sync['metadata']['simulation_time']) * 100)
Of 20000 cycles simulated, 19926 (99.63%) had no events 

Even though "nothing happened" in those timesteps, we still performed computation to determine it, since we probabilistically tried to pass infection from each infected node to each of its susceptible neighbours. For certain parameter combinations – large, dense networks with long-running but not very transmissible infection, for example – this can result in a lot of extraneous computation.

Statistical exactness

A slightly more significant problem is one of statistical exactness: the extent to which the simulation actually performs according to the probabilities. We won;t dig into this in too much detail, but the basic problem is simple to explain. In the do() method, for each possible event, we collect the possible places the event can happen and then decide whether the event actually happens there. There's a hidden assumption here that all these choices are independent of one another, but that's not quite the case. For example, if two infected nodes are connected to the same susceptible node – so there are two SI edges in the locus for infection events – then we have two chances to infect the susceptible node in the same timestep. If the first time happens to result in infection then the second one can't (by definition), making the actual rate of infections in a timestep varies just slightly from the expected value. Similarly, we may happen to run the removal events before the infection events, and so nodes infected in the timestep don'ty have any possibility of recovering in that same timestep – even if the probability of recovery were set very high.

If these sound like trivial issues, well they may well be. But they may not be, depending on the exact combinations of parameters and network structures we encounter. That's a risk it'd be better not to take, as it introduces patterns into the simulation results that aren't there the model descriptions, or indeed in any mathematical analysis we might make of them.

It might be that we're willing to accept these issues in the interests of simplicity: synchronous simulation is very easy to program and understand. But both the performance and the statistical exactness issues are caused by the same basic decision to use discrete time, and it turns out that we can address both by using a different simulation dynamics, one that works directly from the probability distributions in continuous time.

Leave a comment