High-degree numerical differentiation

High-degree numerical differentiation

Performing differentiation in a computer usually relies on numerical techniques. There are several well-known approaches to this, but all tend to suffer from numerical instability when applied many times to compute high-degree derivatives. This isn’t usually a problem, but it is when applied to differentiating the generating functions used to represent degree sequences and other elements of complex networks. We therefore need a better way of performing repeated differentiation.

It’s worth noting that this technique — Cauchy’s formula — has nothing to do with generating functions per se, and can be used in any setting that needs high-degree numerical differentiation of a function. It deserves to be better-known.

Context

Generating functions represent entire probability distributions as a single function, which can then be manipulated in various ways to perform actions on the underlying probabilities. A probability generating function is a formal power series representing a discrete probability distribution, where each term of the series specifies the probability of the random variable taking a given value. The most common example is the node degree distribution generated by:

$$ G_0(x) = \sum_k p_k \, x^k $$

where $p_k$ is the probability that a node chosen at random has degree $k$. The $x^k$ term is simply a formal “binder” that associates the coefficient with the value to which it refers.

There are two main ways to create such a generating function. In the first case (the discrete case) we know all the $p_k$ and simply chain them together with their “binders”. In the second (continuous) case we have an expression for the series itself, for example the Poisson-distributed degree distribution of an ER network is generated by

$$ G_0(x) = e^{\langle k \rangle (x - 1)} $$

One common operation to perform on this kind of distribution would be to extract the coefficient of a specific degree: what is the probability of choosing a node with degree 4, for example? This is by definition the coefficient of the term in $x^4$. In the discrete case we can simply read-off the appropriate coefficient; in the continuous case we cany differentiate $G_0(x)$ four times, normalise to get rid of the multiplicative factors introduced by differentiation, and then evaluate at $0$:

$$ p_4 = \frac{1}{4!} \bigg( \frac{d}{dx} \bigg)^4 G_0(x) \, \bigg|_{x = 0} $$

This method of differentiation works for any generating function, including those that have no closed-form solution, which suggests that we need to be able to perform the operation numerically.

Numerical differentiation using scipy

There are lots of software solutions for numerical differentiation. In Python, the scipy library includes a function that’s suitable for computing fairly low-degree derivatives.

To explore this approach, we’ll need to compare the calculations against real networks. Let’s define a function the extracts the degree histogram of a network, the number of nodes of each degree.

In [1]:
import math
import cmath
import numpy
import networkx
import scipy.misc
import mpmath

import matplotlib
%matplotlib inline
matplotlib.style.use("./book.mplstyle")
import matplotlib.pyplot as plt
In [2]:
def nk_g(g):
    '''Given a network g, return a function that returns the number of
    nodes of a particular degree.
    
    :param g: a network
    :returns: a function from degree to number of nodes'''
    degrees = dict()
    for (_, k) in g.degree:
        if k in degrees:
            degrees[k] += 1
        else:
            degrees[k] = 1
    
    def nk(k):
        '''Return the number of nodes with degree k.
        
        :param k: the degree
        :returns: the number of nodes of this degree
        '''
        if k in degrees:
            return degrees[k]
        else:
            return 0
        
    return nk

We can use this function to create the more familiar function that gives us the probability $p_k$ that a node chosen at random will have degree $k$.

In [3]:
def pk_g(g):
    '''Given a network, return a function that gives the fraction of
    nodes with a given degree. This is also the probability that a
    node chosen from the network at random will have degree k.
    
    :param g: a network
    :returns: a function from degree to fraction'''
    nk = nk_g(g)
    
    def pk(k):
        '''Return the fraction of nodes of degree k.
        
        :param k: the degree
        :returns: the fraction of nodes with degree k'''
        return nk(k) / g.order()
    
    return pk

This function constructs $p_k$ given an empirical network: it counts the node degrees that actually exist. But we know that the degree distribution of an ER network has a closed-form expression: the probability of encountering a node of degree $k$ is given by $p_k = \frac{1}{\langle k \rangle !} \langle k \rangle^k e^{-\langle k \rangle}$, which we can code directly.

In [4]:
def pk_ER(kmean):
    '''Return a function that computes p_k for a given k on an ER network.
    
    :param kmean: the mean degree of the network
    :returns: a function from degree to probability'''
    
    def pk(k):
        '''Return the probability of choosing a node of degree k.
        
        :param k: the degree
        :returnsd: the probability of picking a node of degree k'''
        return (math.pow(kmean, k) * math.exp(-kmean)) / math.factorial(k)
    
    return pk

We’re making use of higher-order functions to simplify the code later on. To get a function from degree to probability, we create a constructor function that takes all the necessary parameters and then hides them from later code. This means we can create a probability function for an explicit network, an ER network, or any other sort of network, and use it uniformly without having to carry the parameters that were used to create it into the later code. This is a useful tool for code re-use.

We now have two functions for extracting the degree distribution: one that extracts it empirically from a network, and one that constructs the distribution of an ER network theoretically. These two functions should coincide, which we can check for a sample network with $N = 10^4$ nodes and a mean degree $\langle k \rangle = 5$.

In [5]:
# network parameters
N = int(1e4)
kmean = 8

# create a random network with this topology
g = networkx.fast_gnp_random_graph(N, kmean / N)

fig = plt.figure(figsize=(3,3))

# theoretical values
ks = list(range(20))
theoretical = list(map(pk_ER(kmean), ks))

# empirical values
pk = pk_g(g)
empirical = list(map(pk, ks))

plt.plot(ks, theoretical, 'r-', label='Theoretical')
plt.plot(ks, empirical, 'go', label='Empirical')

plt.xlabel('$k$')
plt.ylabel('$p_k$')
plt.legend(loc='upper right')
plt.title(f'Degree distribution ($N = {N}, \\langle k \\rangle = {kmean}$)')

plt.show()
No description has been provided for this image

So far so good. We can also encapsulate the degree distribution as a generating function as defined above.

In [6]:
def G0_ER(kmean):
    '''Return the degree generating function for an ER network with mean degree kmean.
    
    :param kmean: the mean degree
    :returns: the generating function'''
    
    def G0(x):
        return cmath.exp(kmean * (x - 1))
    
    return G0

(You might have noticed that we used the cmath package to get exponentiation that includes complex numbers. The reason for this will become clear later.) The theory says that we can extract the $p_k$ values by repeated differentiation of this function, and we can use scipy to find the derivative of the appropriate order.

In [7]:
def pk_scipy(gf):
    '''Construct a function to extract the probability of finding a node
    with given degree by numerically differentiating the degree generating
    function using `scipy`.
    
    :param gf: the generating function
    :returns: the degree probability function'''
    
    # we need the real part of the generating funcxtion
    def gf_real(x):
        return gf(x).real
    
    def pk(k):
        return scipy.misc.derivative(gf_real, 0.0, n=k,
                                     dx=1e-2,
                                     order=k + 1 + (0 if k % 2 == 0 else 1)) / math.factorial(k)
    
    return pk

If we now plot the theoretical values against the results extracted from the generating function, they should again coincide.

In [8]:
fig = plt.figure(figsize=(3,3))

# theoretical values
ks = list(range(20))
theoretical = list(map(pk_ER(kmean), ks))

# extracted values
pk = pk_scipy(G0_ER(kmean))
extracted = list(map(pk, ks))

plt.plot(ks, theoretical, 'r-', label='Theoretical')
plt.plot(ks, extracted, 'go', label='Extracted')

plt.xlabel('$k$')
plt.ylabel('$p_k$')
plt.ylim([0, 0.15])
plt.legend(loc='upper right')
plt.title(f'Degree distribution ($N = {N}, \\langle k \\rangle = {kmean}$)')

plt.show()
No description has been provided for this image

It was going well up to about $k=12$, but then something went seriously wrong: the extracted values don’t match. We can see this even more clearly if we try the same technique against a powerlaw-with-cutoff network.

In [9]:
def pk_PLC(exponent, cutoff):
    '''Return a function that computes p_k for a powerlaw network
    with the given exponent and cutoff.
    
    :param exponent: the exponent of the distribution
    :param cutoff: the cutoff
    :returns: a function from degree to probability'''
    
    # the normalising constant, which will be a real number
    # despite the possibility of polylog being complex
    C = complex(1 / mpmath.polylog(exponent, math.exp(-1 / cutoff))).real
    
    def pk(k):
        '''Return the probability of choosing a node of degree k >= 1.
        
        :param k: the degree
        :returns: the probability of picking a node of degree k'''
        return C * math.pow(k, -exponent) * math.exp(-k / cutoff)
    
    return pk

def G0_PLC(exponent, cutoff):
    '''Return the generating function for a powerlaw distribution with the
    given exponent and cutoff.
    
    :param exponent: the exponent
    :param cutoff: the cutoff
    :returns: the generating function'''
    
    def G0(x):
        return mpmath.polylog(exponent, x * math.exp(-1 / cutoff)) / mpmath.polylog(exponent, math.exp(-1 / cutoff))
    
    return G0
In [10]:
# PLC network parameters
exponent = 2.0
cutoff = 20

fig = plt.figure(figsize=(3,3))

# theoretical values
ks = list(range(1, 60))
theoretical = list(map(pk_PLC(exponent, cutoff), ks))

# extracted values
pk = pk_scipy(G0_PLC(exponent, cutoff))
extracted = list(map(pk, ks))

plt.plot(ks, theoretical, 'r-', label='Theoretical')
plt.plot(ks, extracted, 'go', label='Extracted')

plt.xlabel('$k$')
plt.ylabel('$p_k$')
plt.yscale('log')
plt.ylim([1e-5, 1])
plt.legend(loc='upper right')
plt.title(f'Degree distribution ($\\tau = {exponent}, \kappa = {cutoff}$)')

plt.show()
No description has been provided for this image

Again, we lose results at about $k = 10$.

What’s happening? The problem is the numerical instability of the approach used for numerical differentiation, which starts to break down for high-order derivatives. Since we might have networks with degree significantly greater than 10, this approach to manipulating generating functions isn’t going to be practical.

Numerical differentiation using Cauchy’s formula

There is, however, another approach that involves significantly more sophisticated mathematics. This is suggested in a throwaway comment by Newman about using Cauchy’s integral formula

$$ f^{(n)}(z) = \frac{n!}{2 \pi i} \oint_C \frac{f(w)}{(w - z)^{n + 1}} \, dw $$

This approach will give us the tools we need, and so may be worth understanding despite the fearsome-looking mathematics.

The general approach is simple to state. To compute the $n$’th derivative of a real-valued function $f$, we make an excursion into the complex plane, choosing a closed contour $C$ and computing a path integral around this path. Under certain conditions to do with the function having no poles within $C$, this approach computes high-order derivatives without succumbing to numerical instability.

We need to turn the formula into something executable. We will do this in stages.

Stage 1: Defining the contour

We first need the contour $C$. By “contour” we simply mean a closed loop in the complex plane that encloses the poiint at which we want to perform the differentiation. The simplest appropriate contour is therefore a circle around the point in the complex plane where we want to evaluate the derivative.

We can describe a circle around the origin as a parameterised curve $w = r e^{i\theta}$ where $r$ is the radius of the circle and $0 < \theta < 2\pi$. The circle around a point $z$ is simply this circle shifted to that point, so $w = r e^{i\theta} + z$.

Step 2: Integrating round the contour

We now need to integrae around this countour. We change the path integral around $C$, $\oint_C$, into an ordinary integral on the range of $\theta$, $\int_0^{2\pi}$, changing the variable of integration accordingly. Essentially this changes the arbitrary contour into a circle that we “walk around” by moving through $2 \pi$ radians. To change variable we perform

$$ \oint_C f(w) \, dw = \int_l^h f(\theta) \frac{dw}{d\theta} d\theta $$

Taking $w$ from above, $\frac{dw}{d\theta} = r i e^{i\theta}$, so

$$ \oint_C \frac{f(w)}{(w - z)^{n + 1}} \, dw = \int_0^{2 \pi} \frac{f(r e^{i \theta} + z)}{(r e^{i \theta} + z - z)^{n + 1}} \, r i e^{i \theta} \, d\theta = \int_0^{2 \pi} \frac{f(r e^{i \theta} + z)}{(r e^{i \theta})^n} \, i \, d\theta $$

and so

$$ f^{(n)}(z) = \frac{n!}{2 \pi} \int_0^{2 \pi} \frac{f(r e^{i \theta} + z)}{(r e^{i \theta})^n} \, d\theta $$

with the complex unit $i$ within the integrand being taken out to cancel the $i$ in the constant factor.

Step 3: Simplifying

We could code-up this integral directly, but we can simplify slightly more by observing that we can make $\theta = 2 \pi x$ and let $0 < x < 1$. Changing variable of integration again, we have $\frac{d\theta}{dx} = 2 \pi$ and so

$$ f^{(n)}(z) = \frac{n!}{2 \pi} \int_0^{2 \pi} \frac{f(r e^{i \theta} + z)}{(r e^{i \theta})^n} \, d\theta = \frac{n!}{2 \pi} \int_0^1 \frac{f(r e^{2 \pi x i} + z)}{(r e^{2 \pi x i})^n} \, 2 \pi \, dx = n! \int_0^1 \frac{f(r e^{2 \pi x i} + z)}{(r e^{2 \pi x i})^n} \, dx $$

That gets rid of the constant $\frac{1}{2 \pi}$ factor.

Step 4: Discretising the integral

The expression we now have is an integral on a continuous range $[0, 1]$. To evaluate this integral numrically we need to discretise it so that we compute the integrand at specific points. To do this we create the Riemann sum of the integral where we evaluate it at $m$ equally-spaced points on the range, which of course correspond to $m$ equally-spaced points around the circular contour in the complex plane.

Let $\delta x = \frac{1}{m}$ (where the numerator comes from knowing we have a unit interval) be the step size along the interval. Then

$$ \int_0^1 f(z) \, dz \approx \sum_{j = 1}^m f(z + j \delta x) \, \delta x $$

where the approximation becomes exact in the limit of $m \rightarrow \infty$. We can re-arrange this expression slightly to make it simpler to evaluate, by observing that

$$ \sum_{j = 1}^m f(z + j \delta x) \, \delta x = \sum_{j = 1}^m f(z + j \delta x) \, \frac{1}{m} = \frac{1}{m} \sum_{j = 1}^m f(z + j \delta x) = mean \bigg[ f(z + j \delta x) \bigg]_{j = 1}^m $$

So if we take a small length of the unit interval, we can simply take the average of the integrand computed at these points to get an approximation of the overall integral. The smaller the length the finer the approximation, but the more calculations we need to do.

Coding in Python

We can now code this expression directly in Python. We first construct the array of points along the unit interval. We then compute the mean of the integrand at these points to get the approximate integral, and multiply the result by $n!$.

In [11]:
def differentiate(f, z, n=1, r=1.0, dx=1e-2):
    '''Compute the n'th derivative of f at z using a Cauchy contour
    integral around a circle of radius r in the complex plane.

    :param f: function of one (complex) variable
    :param z: point at which to find the derivative
    :param n: order of derivative (defaults to 1)
    :param r: radius of contour (defaults to 1.0)
    :param dx: step size (defaults to 1e-2)'''

    # make sure the function vectorises
    if not isinstance(f, numpy.vectorize):
        f = numpy.vectorize(f)

    # compute the array of points at which to compute the integrand
    x = r * numpy.exp(2j * numpy.pi * numpy.arange(0, 1, dx))
    
    return math.factorial(n) * numpy.mean(f(z + x) / x**n)

(We use numpys vectorised functions to let us apply a function pointwise to an array, and numpys exponentiation function to handle complex exponentials. We also fix some defaults, such as using a unit circle of radius 1 as the contour and a step size that gives us 100 points to compute.)

Now we can check whether this approach lets us extract the higher-order derivatives we need. The values of $p_k$ extracted in theory for the network, and the values extracted by differentiating the generating function, should coincide.

In [12]:
def pk_cauchy(gf):
    '''Construct a function to extract the probability of finding a node
    with given degree by numerically differentiating the degree generating
    function using Cauchy's formula.
    
    :param gf: the generating function
    :returns: the degree probability function'''
    
    def pk(k):
        return complex(differentiate(gf, 0, k)).real / math.factorial(k)
        
    return pk
In [13]:
# PLC network parameters
exponent = 2.0
cutoff = 20

fig = plt.figure(figsize=(3,3))

# theoretical values
ks = list(range(1, 60))
theoretical = list(map(pk_PLC(exponent, cutoff), ks))

# extracted values
pk = pk_cauchy(G0_PLC(exponent, cutoff))
extracted = list(map(pk, ks))

plt.plot(ks, theoretical, 'r-', label='Theoretical')
plt.plot(ks, extracted, 'go', label='Extracted')

plt.xlabel('$k$')
plt.ylabel('$p_k$')
plt.yscale('log')
plt.ylim([1e-5, 1])
plt.legend(loc='upper right')
plt.title(f'Degree distribution ($\\tau = {exponent}, \kappa = {cutoff}$)')

plt.show()
No description has been provided for this image

Much better! Using the Cauchy formula to compute the derivatives allows us to go up to far higher orders than using more direct methods. The mathematics looks formidable, but can be translated step-by-step into something that can easily be coded directly. The only thing that still requires care is to ensure that the generating function is coded to work with complex numbers so that it can be evaluated around the countour in the complex plane. This can generally be accomplished transparently by using either numpy or cmath functions for exponentiation and so on.

Around the World in Eighty Games: From Tarot to Tic-Tac-Toe, Catan to Chutes and Ladders, a Mathematician Unlocks the Secrets of the World’s Greatest Games

Marcus du Sautoy

A world tour of the world’s board (and other) games. All the usual ones are here, plus others that I guarantee will be new to everyone. The games of Africa remain under-known.

“Unlocking the secrets” is an overblown description of how maths comes into the story. There’s a focus on game theory and decision theory, as one would expect, with a lot of consideration of symmetry (du Sautoy’s own research area) and some autobiographical details too. There’s too little maths in many ways, but perhaps too much for a non-mathematician.

The most interesting application comes from silence: two people are asked the same question three times, but it is answerable only on the third asking (the first two having provided information by the fact that neither could answer previously). Very clever.

Two entirely imaginary games also put in an appearence: Azad from The Player of Games, and the (unnamed) game from The Glass Bead Game. Both are conceived as world-spanning games that model an entire culture’s activities and concerns – but unfortunately aren’t described in enough detail in their source books to admit much analysis. It’s fascinating to think what such an analysis might look like, though.

There is as usual a focus on whether a game has been “solved”, as in whether an optimal strategy is known to exist. Many do, of course, and for others such a Go there are “solutions” using machine learning so that a machine can win – but can’t explain how, which is very unsatisfactory. du Sautoy is clearly exicited by the applications of machine learing to games and other mathematical applications: an excitement I, as a computer scientist, don’t really share.

Having said that, this is an interesting complement to other science-driven games books such as Seven Games: A Human History, with more depth to the undelying maths.

4/5. Finished Sunday 24 December, 2023.

(Originally published on Goodreads.)

Programmatically editing a file from Emacs Lisp

Programmatically editing a file from Emacs Lisp

This is something I frequently want to do: open a named file, work on it programmatically using Lisp code, and save it back – all without user intervention. Like a lot of things in Emacs, it’s easy once you know how.

The trick is to create a new, named, buffer for the file to get its contents. This is done with find-file-noselect as opposed to the more usual find-file that’s usually bound to C-x C-f, and as its name suggests finds (opens) the file without bringing it to the user’s attention. For example,

    ;; open the file in its own buffer
    (with-current-buffer (find-file-noselect fn)

      ;; work on it as required, as the current buffer
      (goto-char (point-min))
      (search-forward "#+END_COMMENT" nil t)
      (beginning-of-line 2)
      (delete-region (point) (point-max))
      (newline 2)

      ;; save the results back
      (save-buffer))

(This example comes from my Emacs interface to the Nikola static site builder used to maintain this site.) The code fragment leaves the current buffer unchanged as far as the user (and the rest of the code) is concerned, and so doesn’t need to be protected by save-excursion or the like.

Estates: An Intimate History

Lynsey Hanley (2007)

A social history of segregated housing in Britain, how thw dream of a more equal approach died, and the consequences for those who lived through it.

Mass social (or council) housing was intended, in the immediate aftermath of the Second World War, to act as a social glue by mixing all sorts of people together so as to break down the walls of class that blighted British society. It failed: Aneurin Bevin’s vision was diluted, as were the building standards, resulting in housing that was both inferior to private stock and immediately identifiable as such, making it stigmatic to live there. The result has been to enhance the blight, erecting what Hanley refers to as “a wall in the head” of people whose self-image is poisoned by poor housing, poor neighbours, and poor (or indeed non-existent) services.

The wall in the head exists in many forms, and one doesn’t have to have grown up on a council estate to have one. But however tall it is, the wall is easy to build when young and hard to climb when older, and it’ll affect you all your life in some way or other. Housing and class have a lot to do with forming the wall.

It’s impossible to miss the connection with Between the World and Me, not least because of the catchy and often-repeated phrasing. Ta-Nehisi Coates was more concerned with physical violence rather than social class, but the parallels are there: different people from different circumstances have completely different perceptions of the country in which they live and what it’s supposed to do for them. I see this all the time with different student cohorts. It’s to some extent a matter of expectations of others, but also of the ability to see opportunities beyond the present. An inward-looking social environment absolutely stunts people’s vision of what’s possible for them.

The book ends on quite q positive note: the willingness of housing providers to renovate (rather than simply dispose of) social housing stock, and the availability of cash to do so. Whether this is sufficient is an open question, and certainly we still see developers skirting-round the requirement to provide “affordable” housing, in some cases literally walling-off the social from the private units, just as happened with the first estates built in the 1940s.

4/5. Finished Tuesday 19 December, 2023.

(Originally published on Goodreads.)

Technofeudalism: What Killed Capitalism

Yanis Varoufakis (2023)

A penetrating analysis of where we are and why.

Varoufakis has form with analysing current political affirs, of course: he wrote about his short time as Greek finance minister during the 2008 financial crisis in Adults in the Room: My Battle with Europe’s Deep Establishment. In this book he tackles a bigger subject: how has the economic system we are now experiencing diverged from the capitalism we’ve known for over a century? Varoufakis’ take is that the torrent of government money that followed the crisis, and increased during the pandemic, was used by the owners of internet-based cloud providers and major platforms – the “cloudalists” – to build-up their capital stock risk- and cost-free, and used this to grab (and then sell) the attention of everyone. Moreover it allowed them to charge rents to everyone else looking to sell on the internet, since network effects made their platforms unavoidable. This return to rentier economics is what Varoufakis terms technofeudalism, and claims this is a better term that the other terms such as late-stage capitalism or hypercapitalism that fail to capture the return to rents as the major source of corporate income – to the point that the cloudalists don’t need to be selling profitable commodities any more.

Varoufakis has a keen eye for anecdotes and corners of the economy in which to test his hypotheses, such as how General Motors turned into a hedge fund with a sideline in cars, or the irony of capitalism being defeated by … more efficient capitalism. But the major point is that technofeudalism isn’t capitalism, and emerged largely because of government largesse leading to unintended consequences followed by significant political capture to keep the new status quo in place and without meaningful regulation.

This is a book about economics, but there’s a computer science perspective that’s perhaps worth taking too. Cloud systems may be inefficient in economic terms, because they extract rents from all users and service providers. But technically they’re almost always more efficient than owning and running your own computers to provide services, as well as reducing costs and risks for start-ups and being a lot more flexible and (potentially) energy-efficient. That’s not to say that there’s no evidence of rentier behaviour: the cloud providers segment the market to charge premium rates for not-very-different services, which in turn changes technical decisions in ways that might not benefit service providers, and they’re notorious for making their charging structures as opaque as possible.

It’s the network effects in terms of access to users’ attention where the rents really become too significant to ignore and almost entirely negative in their effects. We’re also seeing the platforms seek regulation to stifle competition, notably in AI, which will re-inforce their ownership of the most important means of computation.

Another thing to consider is that the cloudalists’ rents make offerings progressively worsen. Cory Doctorow terms this process enshittification, where platform-based services inevitably try to suck-in all the surplus value and squeeze-out other stakeholders – turning them into what Varoufakis terms “cloud serfs” living on whatever pittance of value the platforms don’t deign to hoover-up.

So far so negative. But Varoufakis doesn’t content himself with merely describing the emergence of the current situation. As a former Communist his class-based analysis (which I have to say strikes a chord with me) leads him to try to define an alternative economy that recognises the realities of the techologies we live with. His prescription seems utopian and hard to realise, although not as hard (as he points out) as when trades unions were first formed, or when right-wing dictatorships were defeated – and he provides possible strategies and rallying cries for us to fight back against the return of feudalism in a more pernicious form than anyone ever expected.

5/5. Finished Saturday 9 December, 2023.

(Originally published on Goodreads.)