Layered abstractions and Russian dolls
4/5. Finished 23 May 2012.
(Originally published on Goodreads.)
5 studentships in Computer Science and St Andrews
We have five more PhD scholarships tenable from September, for students interested in any of our areas of research. The details are available here. Please note that I'm not taking on a PhD student this year, so don't bother applying to work with me.
When not to return a library book
Beware of books bearing illnesses. I recently read Hindenburg: the wooden titan, John Wheeler-Bennet's biography of the presidency of the man who ushered Hitler into power. It's an interesting book, written just after Hindenburg's death and so before the second world war and the realisation of exactly what had been brought forth. You can also find contemporary book reviews online which are likewise fascinating in what they don't know. It really brings home AJP Taylor's comment (in The struggle for mastery in Europe) that that hardest thing about the study of history to to remember that events now long in the past once lay in the future. What was unexpectedly fascinating was the bookplate in the front of the volume I borrowed from the university library, which was published in 1937. (I think it's a first edition.) The top half is fairly standard, but the bottom part describes something you wouldn't expect to find in a library book: "A person shall not return any public library book which he knows to have been exposed to infection from a notifiable disease ... [including] smallpox, cholera, diphtheria, ..." And so on. The list of diseases includes things against which we are now routinely vaccinated (diphtheria, tuberculosis), those we haven't seen in the UK in my lifetime (typhus), and those most people would now brush-off with a course of antibiotics (flu, pneumonia). Except, of course, that at the time there was far less vaccination and no antibiotics at all, since mass production didn't start until the 1940's. So as well as being an observation of a period in history from a point in time from which its significance was only poorly understood, this book is a time capsule from a period when diseases were a lot bigger and broader threat than they are now.
The post-modernisation of programming languages
Do we now have some post-modern programming languages? Programming languages change all the time. It's never been the case that one could learn just one and then use it: programming is too rich and complicated for that to be a solution for the majority of programmers. However, we seem to be seeing a change in what we think of as programming languages and how they're used to build larger software systems. At the risk of gross over-simplification (and ridicule), let's divide languages into various "eras". In the beginning, the early languages were built around some explicit notion of the machine on which they were running. Assemblers were devoted to a single machine code. Higher-level languages like Forth abstracted away from the machine code but still exposed the guts of the machine, and in particular its data bus width and memory access mechanisms. The next step in abstraction -- to pre-modernism -- was to provide a machine that was still recognisably a Von Neumann system but that hid the details of the underlying machine code and architecture. C and Pascal fall into this category: their operations and abstractions are still largely those of the physical machine, but at a sufficient remove to achieve portability and a usable programming model. But we still have a view of the machine's underlying limitations. Modern languages like Perl, Java and Haskell offer a model of computation, and especially a model of memory, that's considerably removed from the underlying machine and enormously simplifies the programmer's task, at least for "suitable" programs. (Interestingly enough languages don't fall into eras in the sequence of their design or popularity. Lisp considerably pre-dates C but is definitely modern using the terminology above.) What's perhaps not obvious is that all these languages share a common design philosophy. They are all based on a small set of primitive notions combined with some ways of combining those notions, to get compound abstractions that resemble their primitive underpinnings. In C we get a small number of base types, some compound structs and unions, and pointers, and most interesting compound structures one builds will contain pointers all over the place. In Java we can build classes that create and combine objects; in Haskell we get lists and higher-order functions as the basic building blocks. The point is that the basic concepts are general, as are the composition operators, and the larger capabilities are in some sense a consequence of these initial choices and characterise the programs it's easy (or hard) to write. This philosophy makes perfect sense, of course, which explains why it's so widespread. Programmers have to get to grips with a small set of basic concepts and composition methods. Everything else flows from this, and the extensions to the language -- libraries and the like -- will be the same as code one could write oneself: they're time-savers, but not fundamentally different from your own code. As a consequence the basic concepts have to be well-chosen to support the range of programs people will want to write. Some will be easier to write than others -- that's what makes the choice of language more than simply a matter of fashion -- but there's an acceptance that pretty much anything is writeable. However, we're now seeing languages emerge that aren't like this. Instead they're very much targeted at specific domains and problem sets. We've always had domain-specific languages (DSLs), of course, but the current crop seem to me to be rather different. They've abandoned the design philosophy of a small, basic set of abstractions and powerful composition, in favour of large basic abstractions and limited composition and extension. This significantly changes development, and the balance of power between the language designer and the programmer. XSLT is the language that's brought this most to mind, and especially the transition from XSLT 1.0 to XSLT 2.0. First we have to defend the notion that XSLT is a programming language. I first thought of it as just a transformation system for documents, and so it is: but it's targeted at such a general set of applications that it takes on the flavour of a DSL. It's essentially a functional language with pattern-matching and recursion, whose main data structure is the XML document. It lets one write programs that are driven by these documents: another way to look at it is that it gives executable semantics to a set of XML tags, allowing them to perform computation. XSLT's design doesn't start with a small set of primitives. Instead it provides a collection of tags that can be used to perform the common document operations. If you want to number a list, for example, XSLT contains tags that support the common numbering formats: numbers, letters with brackets, and so forth. What?, you wanted something else? Well, you can probably do it, but it'll be indescribably hard because you'll be writing a program in a language for which full computation is an afterthought. Instead, XSLT 2.0 adds new numbering formats that you can select using a helpful set of tags, and this is done within the XSLT processor, not using the XSLT processor. Meaningful extension requires the intervention of the language designer. This is a major philosophical change, a post-modern approach to language design. Don't focus on getting the core abstractions right: instead, build a system -- any system -- that demonstrates initial functionality, and then extend it with new features to meet new demands regardless of how they fit. There's no overarching conceptual scheme to the system. Instead it's judged purely on the functions it provides, regardless of how they fit together. (Larry Wall described Perl as post-modern, but I don't think he meant quite what I'm getting at here -- although there are similarities.) Post-modernism works for three reasons. Firstly, no-one is expecting to write very large amounts of code in these systems. They're intended for sub-systems, not for systems themselves, and so the need for generality is limited. Secondly, there's a premium on speed of development rather than on maintenance, which in turn puts a premium on getting some result out quickly rather than on ensuring that, in the future, we can get any result out we need. The sub-systems are intended to change on a short timescale, so maintenance and extensibility is perceived to be less of an issue than agility. Thirdly, a lot of these languages target people who aren't programmers -- or at least don't think of themselves as programmers -- who are focused on something other than the code. Post-modernism isn't wrong, and appears in middleware too. It's also important to realise the benefits: it makes it easier to put together larger systems, and increases the ambition of projects one can undertake compared to having to build so much from scratch, But it may be misguided. The people who spent the 1960's writing all the COBOL code that's still running today never thought it'd still be being maintained -- and the consequences of that lead to code that's now impossible to change. Simple solutions have a tendency to grow into more complex ones -- "just one more feature and we're done" -- which can push the costs of inappropriate choices out along the project lifecycle. More importantly, for researchers post-modernism is a seductive siren call that lets you step away from tackling some difficult choices. Instead of picking a small set of concepts, choose any set and then extend them, in any way, to build up your system. I think this obscures some insights one might otherwise gain from simplifying your set of concepts. I think it's also worth remembering that increasing the size and number of abstractions isn't without cost: not for the computer or the compiler, but for the programmer. The more programmers have to remember, and in particular the less well the things to be remembered fit together as a conceptual whole, the harder it is to use the system to its fullest advantage. People instead stay with small sub-sets of the language or -- worse -- settle for programs producing results that aren't those they want, just to stay within their language comfort zone. This sort of simplification is a step backwards, and we need to be careful of the consequences.