Computer science and the financial crisis

Many people expected a financial crisis; many also expected it to be caused by automated trading strategies driving share prices, As it turned out, that aspect of computing in finance didn't behave as badly as expected, and the problems arose in the relatively computing-free mortgage sector. Is there any way computing could have helped, or could help avoid future crises?

Over brunch earlier this week my wife Linda was worrying about the credit crunch. It's especially pressing since we're in Ireland at the moment, and the whole country is convulsed with the government's setting-up of the National Asset Management Agency (NAMA) to take control of a huge tranche of bad debts from the Irish banks. She asked me, "what can you do, as a computer scientist, to fix this?" Which kind of puts me on the spot thinking about a topic about which I know very little, but it got me thinking that there may be areas where programming languages and data-intensive computing might be relevant for the future, at least. So here goes....

The whole financial crisis has been horrendously complex, of course, so let's start by focusing on the area I know best: the Irish "toxic tiger" crash. This has essentially been caused by banks making loans to developers to finance housing and commercial property construction, both of which were encouraged through the tax system. Notably the crash was not caused by loans to sub-prime consumers and subsequent securitisation, and so is substantively different to the problems in the US (although triggered by them through tightening credit markets). The loans were collateralised on the present (or sometimes future, developed) value of the land -- and sometimes on "licenses to build" on land rather than the land itself -- often cross-collateralised with several institutions and developments, and financed by borrowings from the capital markets rather than from deposits (of which Ireland, as a country of 4M people, has rather little). As capital availability tightened across 2008-9 the property market stalled, the value of the land dropped (by 80% in some cases, and by 100% for the licenses to build), and the banks have been left with bad loans and capital shortcomings in the range of at least EUR60B which now the government, for reasons strongly suspected to be political and ideological rather than being necessarily in the best interests of the taxpayer, are taking onto the public balance sheet rather than allowing them to be carried by the banks' owners and bondholders. The crash has also, as might be expected, revealed enormous shortcomings in banks' risk management, their understanding of their own holdings and exposures, some unbelievably lax supervision by the authorities, and a lot of suspected corruption and corporate fraud.

(The above is a gross simplification, of course. The Irish Economy blog is the best source of details, being run by a large fraction of Ireland's leading academic economists who have been depressingly accurate as to how the crisis would unfold.)

So what can computer science say about this mess? To start with, there are several key points we can pull out of the above scenario:

  1. Modelling. Its hard to know what the effect of a given stressor would be on the system: what exactly happens if there's sector-wide unemployment, or a localised fall in purchasing?. Most banks seem to have conducted modelling only at the grossed statistical level.
  2. Instrument complexity. Different financial instruments respond to stimuli in different ways. A classic example is where unpaid interest payments are rolled-up into the principal of a mortgage, changing its behaviour completely. Parcelling-up these instruments makes their analysis next to impossible.
  3. Traceability. The same assets appear in different places without being detected up-front, which makes all the instruments involved significantly more risky and less valuable.
  4. Public trust. The "stress tests" conducted by regulators were conducted in secret without independent scrutiny, as have been the valuations applied to the bad loans. The public is therefore being asked to sign-off on a debt of which it knows nothing.
Clearly this is very complicated, and statistical modelling is only going to provide us with an overview. The simplifications needed to get the mathematics to work will be heroic, despite the power of the underlying analytic techniques.

So let's treat the problem as one of simulation rather than analysis.A mortgage is generally treated as data with a particular principal, interest rate, default rate and so on. It can however also be viewed as process, a computational object: at each accounting period (month) it takes in money (mortage payment) and consequently has a value going forward. There is a risk that the payment won't come in, which changes the risk profile and risk-weighted value of the mortgage. It will respond in a particular way, perhaps by re-scheduling payments (increasing the term of the mortgage), or trying to turn itself into a more liquid object like cash (foreclosing on the loan and selling the house), Foreclosure involves interacting with other objects that model the potential behaviour of buyers, to indicate how much cash the foreclosure brings in: in a downturn, the liklihood of payment default increases and the cash value of mortgages forclosed-upon similarly reduces.

The point is that there's relatively little human, banking intervention involved here: it's mostly computational. One could envisage a programming language for expressing the behaviours of mortgages, which defines their responses to different stimuli and defines an expected present value for the mortgage discounted by the risks of default, the amount recoverable by foreclosure, and so on.

So the first thing computer science can tell us about the financial crash is that the underlying instruments are essentially computational, and can be represented as code. This provides a reference model for the behaviour we should expect from a particular instrument exposed to particular stimuli, expressed clearly as code.

We can go a stage further. If loans are securitised -- packaged-up into another instrument whose value is derived from that of the underlying assets, like a futures contract -- then the value of the derivative can be computed from the values of the assets underlying it. Derivatives are often horrifically complicated, and their might be significant advantages to be had in expressing their behaviours as code.

How do we get the various risk factors? Typically this is done at a gross level across an entire population, but it need not be. We now live in an exabyte era. We can treat the details of the underlying asset as metadata on the code: the location of a house, its size and price history, the owners' jobs and so forth. We currently have this data held privately, and as statistical aggregates, but there's no reason why we can't have associate the actual details to each loan or instrument, and therefore to each derivative constructed from them. This after all is what linked data is all about. means that each financial instrument is inherently computational, and carries with it all the associated metadata. This little package is the loan, to all intents and purposes.

So the second thing computer science can tell us is that we can link instruments, assets and data together, and track between them, using the standard tools and standards of the semantic web. This means we can query them at a high semantic level, and us these queries to extract partitions of the data we're interested in examining further. There's no scientific reason that this can't be done across an entire market, not simply within a single institution.

The net result of the above is that, given this coding and linking, the financial system can be run in simulation. We can conduct stress tests at a far finer resolution by for example running semantic queries to extract a population of interest, making them all redundant (in simulation), and seeing what happens not only to their loans, but to the securitised products based on them -- since everythings just a computation. Multiple simulations can be run to explore different future scenarios, based on different risk weightings an so forth.

(This sounds like a lot of data, so let's treat the UK housing market as a Fermi problem and see if it's feasible. There are 60M people in the UK. Assume 2-person families on average, yielding 30M domestic houses to be mortgaged. Some fraction of these are mortgage-free, say one third, leaving 20M mortgages. The workforce is around 40M working in firms with an average of say 20 employees, each needing premises, yielding a further 2M commercial mortages. If each mortgage needs 10Kb of data to describe it, we have 22M objects requiring about 250Tb of data: a large but not excessive data set, especially when most objects execute the same operations and share a lot of common data: certainly well within the simulation capabilities of cloud computing. So we're not in computationally infeasible territory here.)

We can actually go a few stages further. Since we have a track-back from instrument to metadata, we can learn the risks over time by observing what happens to specific cohorts of borrowers exposed to different stimuli and stresses. Again, linked data lets us identify the patterns of behaviour happening in other data sets, such as regional unemployment (now directly available online in the UK and elsewhere). The more data goes online the easier it is to spot trends, and the easier and finer one can learn the behaviors the system is exhibiting. As well as running the simulation forwards to try to see what's coming, we can run it backwards to learn and refine parameters that can then be used to improve prediction.

Therefore the third think computer science can tell us is that the financial markets as a whole are potentially a rich source of machine learning and statistical inference, which can be conducted using standard techniques from the semantic web.

Furthermore, we can conduct simulations in the open. If banks have to represent their holdings as code, and have to link to (some of) the metadata associated with a loan, then regulators can run simulations and publish their results. There's a problem of commercial confidentiality, of course, but one can lock-down the fine detail of metadata if required (identifying owners by postcode and without names, for example). If each person, asset and loan has a unique identifier, it's easier to spot cross-collateralisations and other factors that weaken the values of instruments, without needing to be able to look inside the asset entirely. This exposes a bank's holdings in metadata terms -- residential loans in particular areas -- but that's probably no bad thing, given that the lack of information about securitise contributed to the fall.

This is the last thing computer science can tell us. Open-source development suggests that having more eyes on a problem reduces the number, scope and severity of bugs, and allows for re-use and re-purposing far beyond what might be expected a priori. For a market, more eyes means a better regulatory and investor understanding of banks' positions, and so (in principle) a more efficient market.

For all I know, banks already do a lot of this internally, but making it an open process could go a long way to restore confidence in both taxpayers and future investors. There's no time like a crash to try out new ideas.