Evolving programming languages

Most programming languages have fixed definitions and hard boundaries. In thinking about building software for domains we don’t understand very well, a case can be made for a more relaxed, evolutionary approach to language design. I’ve been thinking a lot about languages this week, for various reasons: mainly about the recurring theme of what are the right programming structures for systems driven by sensors, whether they’re pervasive systems or sensor networks. In either case, the structures we’ve evolved for dealing with desktop and server systems don’t feel like they’re the right abstractions to effectively take things forward. A favourite example is the if statement: first decide whether a condition is true or false, and execute one piece of code or another depending on which it is. In a sensor-driven system we often can’t make this determination cleanly because of noise and uncertainty — and if we can, it’s often only probably true, and only for a particular period. So are if statements (and while loops and the like) actually appropriate constructs, when we can’t make the decisions definitively? Whatever you think of this example (and plenty of people hate it) there are certainly differences between what we want to do between traditional and highly sensorised systems, and consequently how we program them. The question is, how do we work out what the right structures are? Actually, the question is broader than this. It should be: how do we improve our ability to develop languages that match the needs of particular computational and conceptual domains? Domain-specific languages (DSLs) have a tangled history in computer science, pitched between those who like the idea and those who prefer their programming languages general-purpose and re-usable across a range of domains. There are strong arguments on both sides: general-purpose languages are more productive to learn and are often more mature, but can be less efficient and more cumbersome to apply; DSLs mean learning another language that may not last long and will probably have far weaker support, but can be enormously more productive and well-targeted in use. In some ways, though, the similarities between traditional languages and DSLs are very strong. As a general rule both will have syntax and semantics defined up-front: they won’t be experimental in the sense of allowing experimentation within the language itself. If we don’t know what we’re building, does it make sense to be this definite? There are alternatives. One that I’m quite keen on is the idea of extensible virtual machines, where the primitives of a language are left “soft” to be extended as required. This style has several advantages. Firstly, it encourages experimentation by not forcing a strong division of concepts between the language we write (the target language) and the language this is implemented in (the host language): the two co-evolve. Secondly, it allows extensions to be as efficient as “base” elements, assuming we can reduce the cost of building new elements appropriately low. Thirdly, it allows multiple paradigms and approaches to co-exist within the same system, since they can share some elements while having other that differ. Another related feature is the ability to modify the compiler: that is, don’t fix the syntax or the way in which its handled. So as well as making the low level soft, we also make the high level soft. The advantage here is two-fold. Firstly, we can modify the forms of expression we allow to capture concepts precisely. A good example would be the ability to add concurrency control to a language: the low-level might use semaphores, but programing might demand monitors or some other construct. Modifying the high-level form of the language allows these constructs to be added if required — and ignored if not. This actually leads to the second advantage, that we can avoid features we don’t want to be available, for example not providing general recursion for languages that need to complete all operations in a finite time. This is something that’s surprisingly uncommon in language design despite being common in teaching programming: leaving stuff out can have a major simplifying effect. Some people argue that syntax modification is unnecessary in a language that’s sufficiently expressive, for example Haskell. I don’t agree. The counter-example is actually in Haskell itself, in the form of the do block syntactic sugar for simplifying monadic computations. This had to be in the language to make it in any way usable, which implied a change of definition, and the monad designers couldn’t add it without the involvement of the language “owners”, even though the construct is really just a re-formulation and generalisation of one common in other languages. There are certainly other areas in which such sugaring would be desirable to make the forms of expression simpler and more intuitive. The less well we understand a domain, the more likely this is to happen. Perhaps surprisingly, there are a couple of existing examples of systems that do pretty much what I’m suggesting. Forth is a canonical example (which explains my current work on Attila); Smalltalk is another, where the parser an run-time are almost completely exposed, although abstracted behind several layers of higher-level structure. Both the languages are quite old, have devoted followings, and weakly and dynamically typed — and may have a lot to teach us about how to develop languages for new domains. They share a design philosophy of allowing a language to evolve to meet new applications. In Forth, you don’t so much write applications as extend the language to meet the problem; in Smalltalk you develop a model of co-operating objects that provide direct-manipulation interaction through the GUI. In both cases the whole language, including the definition and control structures, is built in the language itself via bootstrapping and cross-compilation. Both languages are compiled, but in both cases the separation between run-time and compile-time are weak, in the sense that the compiler is by default available interactively. Interestingly this doesn’t stop you building “normal” compiled applications: cross-compile a system without including the compiler itself, a process that can still take advantage of any extensions added into the compiler without cluttering-up the compiled code. You’re unlikely to get strictly the best performance or memory footprint as you might with a mature C compiler, but you do get advantages in terms of expressiveness and experimentation which seem to outweigh these in a domain we don’t understand well. In particular, it means you can evolve the language quickly, easily, and within itself, to explore the design space more effectively and find out whether your wackier ideas are actually worth pursuing further.