The ditch project: a motivation

Citizen Sensing's first project is to perform environmental monitoring on a small scale with minimally-specification hardware.

Orchids, Co Sligo

We have three sets of goals from this project. Firstly, we want a realistic scientific project that's simple enough to accomplish over a summer. Secondly, we want to gain some experience with technologies in the field, what works and what doesn't at this level of sophistication. Finally, we want to develop an initial software platform that can support further experimentation. To deal with these in order:

The science will be very simple. Some areas have climates that vary over very short distances, and these variations can affect the flora and fauna that live there. In particular, areas like depressions and ditches often have different temperature profiles than the surrounding areas, with less variation in temperature, that may suit some species. While this is clear in general, the specifics often aren't well-understood, so it's a worthwhile scientific exercise to measure temperature variation around a small area. Specifically, we'll seek to measure temperature variation around a ditch we have access to, in County Sligo in the West of Ireland.

The technology will be off-the-shelf hardware, using Arduinos for processing and XBee radios for communications. These are extremely cheap commodity items that haven't really been used for experimental environmental science, and it's by no means clear that they're appropriate for building a long-lived, robust sensor network.

The main software challenges include working the hardware and -- more significantly -- managing the very limited power budget one gets with standard batteries and hardware. In building the software, we're aiming for a re-usable platform that can be packaged in a form so that the experiment can be reproduced or used as the basis or other experiments by other groups of scientists.

Is this a worthwhile project? From a scientific point of view its clearly questionable, but that's not the point: this sort of experiment is representative of a wide range of environmental sensing, both citizen and professional, so making it work opens the door to a wider spectrum of investigations without getting bogged-down initially in the details of a scientific challenge when the computing is so new and uncertain.

The blog for the project will let you follow all the gory details.

Actor systems

In computer science, an actor system is a way of building programs that are very concurrent and very amenable to scheduling and management.

The idea of an actor system goes back to the PhD work of Gul Agha. His actor model structures programs around a collection of simple agents called (unsurprisingly) actors. Each actor associates a mailbox with a behaviour. The mailbox receives messages from other actors or from outside the actor system. Messages are queued in mailboxes until processed one at a time by the associated behaviour.

The behaviour itself is a piece of code that, when run to process a message, performs bounded computation on the message's contents, which may involve sending messages to other actors and creating other actors (and their mailboxes). The boundedness of the computation is important: an actor is guaranteed to run for a finite amount of time before completing, and so cannot (for example) perform an unbounded loop. An actor's last action before terminating is to nominate a replacement behaviour for its mailbox, which may be the current behaviour or some new behaviour. (A null behaviour that did nothing in response to a message would essentially delete the actor.)

The complexity of the system is clearly going to come from how the behaviours are selected an scheduled. The model says very little about scheduling, leaving the implementation to decide when to process messages (by running the behaviour of the associated actor). A single-threaded implementation might repeatedly select a mailbox at random, check whether it contained messages and, if so, process one. A multi-threaded implementation could have one thread per mailbox running behaviours as messages arrive. There are plenty of other possibilities in between: the point is that an actor program doesn't control the concurrency, it simply induces it by the way it creates actors and sends messages.

A system without unbounded loops can't express general computation, but actor systems do allow unbounded computation: they simply force the programmer to create it using communicating actors. An actor wanting to loop forever could, for example, receive a message, perform some processing, send another message to itself (its own mailbox), and then nominate itself as its own replacement behaviour, which would then receive the self-sent message, and so forth.

If the actor model sounds restrictive, that's because it is deliberately designed that way. Its strength is that it is immune from deadlock, since the finite behaviours cannot become stuck indefinitely. This doesn't preclude the possibility of livelock if the system busily processes messages without actually making progress. However, the boundedness of behaviours means that the scheduler is always guaranteed to get control back on a regular basis, which means that there is always the possibility of an actor being able to run, making actor systems immune to starvation.

It's easy to build something that looks roughly like an actor system in a general-purpose programming language -- and usually pretty much impossible to build something that is actually an actor system. This is because a general-purpose programming language will allow behaviours that include unbounded loops, so you can't guarantee that a behaviour will terminate, and so you lose one of the major features of actor systems: their deadlock-freedom. With suitable programmer care, however, you can build an actor system quite easily, deploying however much concurrency is appropriate for the application and platform the system runs on.

Studentship available in Newcastle upon Tyne

My colleagues in Newcastle (where I did my first degree) have an MRes/PhD position available in computational behaviour analysis.

Together with colleagues from Agriculture, the School of Computing Science in Newcastle have recently been active in the field of computational behaviour analysis in livestock animals. They have a small grant on which we are currently doing some  pilot work related to food safety and animal wellbeing, which has now given rise to funding for an integrated MRes / PhD studentship to work in this field.

The group wants to work on a system for automatic tracking of farm pigs (indoor) and on modelling their behaviour (as individuals and in the group) with the goal of early detection of changes therein that might be linked to  certain diseases. The project involves some very strong partners from and connections to agriculture and access to farms and industry.  The topic is extremely relevant for the UK industry and has wider societal impact (food safety, animal welfare). They are looking for strong CS candidates with a background in machine learning, signal processing, or computer vision, and an interest in working in an multi-disciplinary team on a challenging and very interesting topic with substantial societal and economic relevance.

The funding covers four years of fees and maintenance. UK home  students (and EU citizens who have been residents for at least three years) are eligible for the full funding, EU students essentially for 50%. Throughout the first year the student will be enrolled into an MRes program with courses and time to get familiar with the topic, which is a fantastic opportunity for students to obtain an interdisciplinary skill-set.

More details can be found here. Application deadline is the 5th April 2013.

Being a student with open courseware

The move towards massively open on-line courseware (MOOCs) has the potential fundamentally to change the practice and experience of university. But what does it feel like to study on a MOOC-delivered course?

No-one in academia can have been indifferent to the influence the internet has had on their teaching and research over the past decade, but MOOCs are a significant development nonetheless. They represent an attempt to "unbundle" education from the traditional providers, reducing barriers to entry and costs to students while facilitating alternative styles of learning and attendance not always supported in existing institutions. It's impossible to generalise about how universities deliver their material, but it's fair to say that the ability to study a bespoke programme at one's own speed and from whatever location is convenient is a significant departure from the usual three-to-four-year resident degree programme. (See, for example, Salman Khan. What college could be like. Communications of the ACM 56(1). January 2013.) Like most other research-led institutions, St Andrews is exploring on-line learning and MOOCs -- in our case as part of the FutureLearn initiative. But planning a radical style-change in teaching doesn't really work if it's purely "from the top", as it's the experience "from the bottom", as a student, that will dictate whether an approach succeeds or fails.

So I signed up to take a MOOC module. We're only in Week 3 (of 12) at present, but it's worth making some early observations.

To avoid embarrassing the providers, I won't identify the lecturer or module. Suffice to say it's being delivered on one of the three major platforms for MOOCs; is delivered by a tenured academic at one of the world's top universities; is a science subject, not a humanity; and is not in computer science or mathematics, to avoid the risk of my subject expertise getting in the way.

The experience of signing up is as simple and straightforward as one would expect from a modern web provider, with each module being given its own home page containing links to all the videoed lecture material, practicals, class tests, and a discussion forum. Clearly a lot of work has gone into preparing all the material: around 30 hours of video (with closed captioning) plus materials for formative and summative assessment: it's a substantial investment of time by all concerned.

Interestingly, this particular module is being run by the originating university as an experiment, to see how best they can deploy on-line learning. The student cohort includes both on-line students and students resident on campus for a "traditional" programme -- with the former outnumbering the latter by two orders of magnitude. Both sets are taking the same classes and assessments, but the resident students can also avail of a lecture each week "on demand" on any subject they feel needs more attention. This could be seen as a safety net in case the on-line experience is less than satisfactory, but also as a feedback loop to refine the module for next time: without this, it'd be harder to work out which parts needed more explanation, as always happens.

The main course material is divided into units with about two hours of video material each, split into convenient sections which can be watched on-line or downloaded. The partitioning is convenient both for students and (I suspect) for the lecturer, who can record them in shorter "takes" to simplify correction and re-recording. The videos as typically either the lecturer as a "talking head" against a plain background, or a slide set onto which the lecturer writes additional notes as he talks. (Sometimes a single video piece includes both styles.) Having handwriting appear works better than might be expected: it keeps the human connection, is less mechanical and more engaging than might be the case with purely printed slides. Some sections also include in-line multiple-choice questions for students to answer to check understanding during the lecture.

Having a lecture on video has a number of advantages, not least the ability to pause and repeat arbitrarily. I found this useful for note-taking, in that I could listen to a section, pause it to write summary notes, and not miss what the lecturer said next. A more experienced student note-taker might be able to multi-task enough for this not to matter. Note-taking is important for its own sake, of course, as it activates other parts of your brain than listening alone and makes the material much more engaging and likely to stick. Repetition has been less useful, which may just be a reflection of the level of the material so far, and a non-native English-speaker would probably find it invaluable.

The module assessment includes some labs, which to get the whole class involved had to occur at set times: 9pm UK time was bad enough, but of course Indian and Chinese students had it much worse! However, the lab software didn't work: I think it simply couldn't handle the demand of the student numbers, although it was hard to tell for sure from my outsider's viewpoint. The lecturer ended up dropping the entire lab component and changing the rubric, which seemed substantially easier than the same task would have been in St Andrews, where it would have raised questions about the fairness of assessment. In this case it probably reflects teething troubles, but it does highlight some of the concerns that have been raised about the standard of examination (and consequently the value of certification) from MOOC-delivered courses: does the module actually still make sense without the labs with which it was designed?

The remaining assessment consists of problems sets and an examination. All of these have deadlines, but with a sliding scale of lateness penalty (up to a fortnight) within which students can still submit. This seems reasonable for an on-line module, especially when one has no visibility of, or expectation about, students' personal circumstances. The marking overhead for assessment will be substantial, however, as the problem sets aren't all multiple-choice -- although they do show signs of being designed for large-scale grading. Coming from a university with small classes (less than a hundred is the norm, less than twenty not exceptional), this would require a significant change in my style of assessment -- although to be fair, no university is set-up for class sizes in the tens of thousands that could easily attend a MOOC. This remains a major academic and technical challenge.

The discussion forum was a particularly interesting experience. There are a number of teaching assistants as well as the lecturer monitoring the discussions, answering a guiding questions, and students get credit for their engagement in discussions. Some students were noticeably more active, offering a range of different experiences and levels of expertise. Some clearly know enormously more about the subject matter than has been taught, possibly students of this area trying to supplement their experience with modules from another -- perhaps more prestigious -- university. Others, it rapidly became clear, are actually academics doing exactly what I'm doing: attending the class for interest, but really studying the MOOC and its experience. As I said, MOOCs are not an area any institution is willing to let pass by.

Overall the experience is broadly positive, and I'm not concerned about any of the technological challenges posed: they're simply those of any large, distributed computer system. But I think at least four questions arise about how to do this sort of learning well.

Firstly, it's clear that there's an immense amount of work involved at the back end of any MOOC, in terms of its design and preparation but also in terms of timetabling, support and -- especially -- assessment. Unless one is willing to accept multiple-choice questions or other approaches that can be mechanised it's hard to see how this can be avoided, and it's at a scale that few if any first-tier institutions have experience with.

Secondly, the feeling of a class community is notably absent, and discussion boards can't make up for face-to-face interaction and live discussion. Especially in a small institution like St Andrews, the student experience -- of a small, closely-knit class in daily close contact with lecturers, demonstrators, and the wider student body -- is a major attraction, one that is highlighted by our teaching fellows and in all our student surveys, and one that would be hard to match in a distance setting. Perhaps this is simply academic snobbery: the cost of such an experience may be prohibitive for a progressively larger fraction of potential students. On the other hand, its pedagogical advantages are unquestionable. If we simply accept the difference, we run the risk of consigning a large group of students to a study experience we know to be sub-standard -- and indeed have designed to be such. Are there ways to provide an equivalent, if not better, experience? This has to be a major topic for future research, and has technological and sociological dimensions.

Thirdly, distance interrupts the feedback loops that usually exist between a lecturer and their material, and between lecturer and class. In a local setting, feedback is generally immediate and clear: you can tell pretty much instantly if a class isn't engaged with or understanding the material, and adjust accordingly. In a distance setting, feedback is indirect and time-delayed, moderated by students' willingnesses to criticise or complain about their lecturers, which I would expect to complicate the refinement of a module that usually happens. Again, designing a better experience is probably not an insoluble problem, but it is an urgent one.

Finally, there's the issue of coherence in a system in which one can make arbitrary module choices. I'm a great believer in broad curricula and plenty of options, but one often wants to get a degree in something, and this requires structuring either for professional or academic reasons. Without such structure, certification of what the student has learned, and how well, become a lot more difficult. But even beyond this, there's a problem of the embarrassment of choice, and of not knowing what one should study to accomplish whatever learning goal one has. This is a function often provided by advisers, counsellors and other staff within a university, who can help students understand what to study and why they're going to university in the first place. Their function is easily forgotten in distance learning, but seems to me to be vital for successful learning outcomes and eventual student satisfaction. It certainly suggests that the notion of distance learning needs to be broadened significantly beyond the content if it's to offer a real alternative to a "normal" degree: indeed, the content sometimes feels almost irrelevant, given the avalanche of material available on the web outside the MOOC universe. Curation, direction and socialisation feel like far more dominant challenges.

When I decided to conduct this experiment, I thought it would draw me into the technology of MOOCs. It didn't: the technology is well done, but unexceptional. The content is great, but again no more impressive than any number of other sources on the internet. (Admittedly a sample size of one is hardly conclusive….) The sociology, however, is another matter: there's clearly a need to study the ways in which people experience learning through computers, at a distance, and in distributed groups across which one wants to establish a cohort effect, and how the universities can leverage their expertise in this into the on-line world. It'll be interesting to see what happens next.

Hypothetical adventures in your chosen field

A discussion on the different styles of PhD thesis that might be appropriate for different students and research areas.

I was talking to one of my students about the "shape" of their PhD, and how we might present the work they're doing in the final thesis. This particular student is working on the science of complex coupled networks, which is a very new area of network science -- itself a pretty new field. The exact details of her work aren't important: the important point is that it's an area that's so poorly understood that it's hard to ask meaningful detailed questions about it.

A PhD is a training in research: the work done in the course of one doesn't usually change the world, and a fact that's often forgotten is that they're not supposed to. The goal of a PhD is to demonstrate that the candidate can conceive of, plan, conduct, evaluate a communicate a programme of research, sustained over typically three or four years, that adds something to human knowledge. Clearly there are many ways one can demonstrate this, and the question we were asking was: what are the variations? How do we choose which is appropriate?

A typical thesis is a monograph, written specially to describe the research, its process and results. In this it's very different from a scientific paper, which typically focuses on a single result and often elides a lot of the motivation and process. Publications along the way always help, of course: they let a reader (or examiner) know that the work being presented has been reviewed by others in the field, which always gives a certain amount of confidence that it contains worthwhile results. (Few things make a PhD examiner's heart sink more than a thesis with no associated publications. It means you're on your own in terms of assessing every aspect of the work.) The monograph needs a "shape", a skeleton upon which to hang the research, as well as clearly-enumerated contributions to knowledge and an understanding of its own boundaries.

There are however a couple of different styles of skeletal structure.

The hypothesis. This is my favourite, truth be told. The work is encapsulated in a question, and the rest of the thesis seeks to answer this question. The answer might be yes, no, or maybe, all of which can be valid and can result in the award of a degree. The research then proves the hypothesis, disproves it, narrows it down as incomplete, or uncovers extra structure that would require substantially more work to explore -- or some combination of these.

There are several reasons to like this style of thesis. Firstly, the hypothesis is usually something concrete within the field of study. (It may seem hopelessly abstract to anyone outside the field, of course.) That makes it easy for an examiner or other reader to relate to: the question makes sense to them, as something one would want to know the answer to. Secondly -- and more importantly -- you have a pretty clear goal and consequently a pretty good notion of when you're finished. A PhD can almost always go on for ever, of course, but having a tight hypothesis means that you have a reference against which to test the work and decide whether it's sufficient to convince the reader of your conclusions.

Against this we may contrast the second type of thesis:

Adventures in your chosen field. In this kind of thesis, there is no hypothesis. Instead, the thesis marks-out an area of interest and explores it, with a view to understanding or characterising some or all of it but without any driving question that demands an answer. Unlike the hypothesis, this kind of thesis isn't really "for" anything: it's intended as an exploration, a search for features -- and indeed possibly for hypotheses to be tested in the future. But this style of thesis is perfect for some areas, like mathematics and the more abstract end of computer science, where something new needs to be explored for a while to see if it makes sense and what is has to offer. (It's also found in non-monographical theses, where some universities allow candidates to submit a portfolio of published papers in place of the traditional approach. I must say I always find theses like these quite disjointed and fundamentally dissatisfying: there's a lot to be said for a purpose-written, rounded monograph as a conclusion to a programme of research.)

One would probably have to admit that an "adventures" thesis is a little more challenging to examine, since it doesn't motivate its own intention as clearly as a "hypothetical" thesis. It's going to be harder for the candidate to decide when it's ready, too, since one can always do just one more search to find something else of interest. In that sense, an "adventurer" needs to be more disciplined both in deciding when enough is enough and in deciding what, of the many possible choices, constitute the interesting pathways to explore.

I don't think either style is inherently preferable. It depends entirely on the subject matter, and one could easily fall into the mistake of trying to construct a hypothesis for work that was really an "adventure", or indeed not constructing one for a thesis that really needed one to motivate and structure it. There's a clear difference between an exploratory thesis and a rambling one: the exploration needs structure and coherence.

It's also perhaps worth considering the psychologies and research styles of different students. Some students need concrete goals: to build something, to test something, to prove or disprove a conjecture. These people clearly benefit from doing "hypothetical" theses. Other students have an interest in an area rather than in any specific outcome, and so might prefer an "adventure". There's no point in forcing one into the other, any more than one can get a theoretically-inclined student motivated by coding (and the associated debugging): it's just not what floats their boat, and it's better that they know their own minds, strengths and interests than to spend three or four years engaged in a quest to which they're fundamentally misaligned.

Distinguished Dissertations competition 2013

The British Computer Society's distinguished PhD dissertations competition is now accepting nominations for 2013.

The Conference of Professors and Heads of Computing (CPHC), in conjunction with BCS and the BCS Academy of Computing, annually selects for publication the best British PhD/DPhil dissertations in computer science and publishes the winning dissertation and runner up submission on the BCS website.

Whenever possible, the prize winner receives his/her award at the prestigious BCS Roger Needham Lecture, which takes place annually at the Royal Society in the autumn.

After a rigorous review process involving international experts, the judging panel selects a small number of dissertations that are regarded as exemplary, and one overall winner from the submissions.

Over forty theses have been selected for publication since the scheme began in 1990.

More details, including links to some past winners, can be found on the competition web site. The closing date for nominations is 1 April 2o13.

Six 600th anniversary PhD scholarships

As part of St Andrews' 600th anniversary, the University is funding six scholarships in computer science (and all our other Schools, for that matter).

Details are on our web site. There's no special application process, and the studentships are open to all qualified applicants. You'd be well advised to research potential research supervisors and discuss things with them ahead of applying. Most staff -- including me -- will be accepting students under this scheme.

Chief scientific advisor for Ireland

I'm a little perplexed by the direction of the current discussion over Ireland's chief scientific advisor.

The need for scientific advice, as a driver of (and sanity check upon) evidence-based policy formation, has arguably never been greater. Certainly many of the challenges we face living in the 21st century are, directly or indirectly, influenced by a sound understanding of both the science and its limitations. This is why it's attractive for governments to have dedicated, independent scientific advisors.

We need first to be clear what a good chief scientific advisor isn't, and that is an oracle. The chief scientist will presumably be a trained, nationally and internationally respected practitioner who brings to the job an experience in the practice of science but also in its wider analysis and impact. In many respects these are the qualities one looks for in a senior academic -- a full professor -- so it's unsurprising that chief scientists are often current or former (emeritus) professors at leading institutions. They will not be expert on all the areas of science required to address any particular policy question -- indeed, they might never be expert in the details pertaining to any question asked -- but they can act as a gateway to those who are expert, and collate and contrast the possibly conflicting views of different experts who might be consulted in each case. They provide a bridge in this sense between the world of the research and the world of policy, and will need to be able to explain clearly the basis and consequences of what the science is saying.

Ireland has recently abandoned having a chief scientist as an independent role, and has instead elected to combine it with the role of Director of Science Foundation Ireland, the main advanced research funding agency. There are several stated reasons for this, most centring on the current resource constraints facing government spending.

I don't think this structure is really consistent with the role of chief scientist, nor with the principles of good governance.

To avoid any misunderstandings, let's be clear that this has nothing to do with the individuals involved: the current or former directors of SFI would be eminently suited to be a chief scientist in their own right. However, having the SFI director fill this additional role ex officio seems not to be best practice. The concern must be that the combined role cannot be independent by its very nature, in that the scientific direction of SFI may wholly or in part be involved in the policy decisions being made as a consequence of advice received. If this occurs, the chief scientist is then recommending actions for which the director must take responsibility, and the perception of a confusion of interest is inevitable if these roles are filled by the same individual. To repeat, the integrity of the office-holders is not the issue, but rather a governance structure that conflates the two roles of execution and oversight.

If resource constraints are really the issue, one might say that the chief scientist does not need to be independently employed to be independent in the appropriate sense. The chief scientific advisory roles in the UK, for example, are typically filled by academics on part-time release from their host institutions. They collate and offer scientific advice, possibly made better by the fact that they remain active researchers and so remain current on both the science and its practice, rather than being entirely re-located into the public service. (The chief scientific advisor for Scotland, for example, remains a computer science researcher at the University of Glasgow in addition to her advisory role.) The risk of confusion is significantly less in this structure, because a single academic in a single institution does not exert executive control or influence over wider funding decisions. Moreover the individual remains employed by (and in the career and pension structure of) their host university and is bought-out for part of their time, which reduces the costs significantly. It also means that one can adjust the time commitment as required.

I thought when the post was first created that it was unusual that the Irish chief scientist's post was full-time: requiring the (full-time) SFI director to find time in addition for the (presumably also full-time) duties of chief scientific advisor is expecting a lot.

It is of course vital for the government to be getting good science advice, so it's good that the chief scientist role is being kept in some form. But I think it would be preferable to think about the governance structure a little more, to avoid any possible perception of confusion whether or not such confusion exists in practice.

Parser-centric grammar complexity

We usually think about formal language grammars in terms of the complexity of the language, but it's also possible -- and potentially more useful, for a compiler-writer or other user -- to think about them from the perspective of the parser, which is slightly different.

I'm teaching grammars and parsing to a second-year class at the moment, which involves getting to grips with the ideas of formal languages, their representation, using them to recognise ("parse") texts that conform to their rules. Simultaneously I'm also looking at programming language grammars and how they can be simplified. I've realised that these two activities don't quite line up the way I expected.

Formal languages are important for computer scientists, as they let us describe the texts allowed for valid programs, data files and the like. The understanding of formal languages derives from linguistics, chiefly the work of Noam Chomsky in recognising the hierarchy of languages induced by very small changes in the capabilities of the underlying grammar: regular languages, context-free and context-sensitive, and so forth. Regular and context-free languages have proven to be the most useful in practice, with everything else being a bit too complicated for most uses.

The typical use of grammars in (for example) compilers is that we define a syntax for the language we're trying to recognise (for example Java) using a (typically context-free) grammar. The grammar states precisely what constitutes a syntactically correct Java program and, conversely, rejects syntactically incorrect programs. This parsing process is typically purely syntactic, and says nothing about whether the program makes sense semantically. The result of parsing is usually a parse tree, a more computer-friendly form of the program that's then used to generate code after some further analysis and optimisation.

In this process we start from the grammar, which is generally quite complex: Java is a complicated system to learn. It's important to realise that many computer languages aren't like this at all: they're a lot simpler, and consequently can have parsers that are extremely simple, compact, and don't need much in the way of a formal grammar: you can build them from scratch, which would be pretty much impossible for a Java parser.

This leads to a different view of the complexity of languages, based on the difficulty of writing a parser for them -- which doesn't quite track the Chomsky hierarchy. Instead you end up with something like this:

Isolated. In languages like Forth, programs are composed of symbols separated by whitespace and the language says nothing about the ways in which they can be put together. Put another way, each symbol is recognised and responded to on its own rather than as part of a larger construction. In Java terms, that's like giving meaning to the { symbol independent of whether it's part of a class definition or a for loop.

It's easy to see how simple this is for the compiler-writer: we simply look-up each symbol extracted from the input and execute or compile it, without reference to anything else. Clearly this only works to some extent: you only expect to see an else branch somewhere after a corresponding if statement, for example. In Forth this isn't a matter of grammar, but rather of compilation: there's an extra-grammatical mechanism for testing the sanity of the control constructs as part of their execution. While this may sound strange (and is, really) it means that it's easy to extend the syntax of the language -- because it doesn't really have one. Adding new control constructs means adding symbols to be parsed and defining the appropriate control structure to sanity-check their usage. The point is that we can simplify the parser well below the level that might be considered normal and still get a usable system.

Incremental. The next step up is to allow the execution of a symbol to take control of the input stream itself, and define what it consumes itself. A good example of this is from Forth again, for parsing strings. The Forth word introducing a literal string is S", as in S" hello world". What happens with the hello world" part of the text isn't defined by a grammar, though: it's defined by S", which takes control of the parsing process and consumes the text it wants before returning control to the parser. (In this case it consumes all characters until the closing quotes.)  Again this means we can define new language constructs, simply by having words that do their own parsing.

Interestingly enough, these parser words could themselves use a grammar that's different from that of the surrounding language -- possibly using a standard parser generator. The main parser takes a back seat while the parser word does its job, allowing for arbitrary extension of the syntax of the language. The disadvantage of course is that there's no central definition of what constitutes "a program" in the language -- but that's an advantage too, certainly for experimentation and extension. It's the ideas of dynamic languages extended to syntax, in a way.

Structured. Part of the subtlety of defining grammars is avoid ambiguity, making sure that every program can be parsed in a well-defined and unique way. The simplest way to avoid ambiguity is to make everything structured and standard. Lisp and Scheme are the best illustrations of this. Every expression in the language takes the same form: an atom, or a list whose elements may be atoms or other lists. Lists are enclosed in brackets, so it's always possible to find the scope of any particular construction. It's extremely easy to write a parser for such a language, and the "grammar" fits onto about three lines of description.

Interestingly, this sort of language is also highly extensible, as all constructs look the same. Adding a new control construct is straightforward as long as it follows the model, and again can be done extra-grammatically without defining new elements to the parser. One is more constrained than with the isolated or incremental models, but this constraint means that there's also more scope for controlled expansion. Scheme, for example, provides a macro facility that allows new syntactic forms, within the scope and style of the overall parser, that nevertheless behave differently to "normal" constructs: they can capture and manipulate fragments of program text and re-combine them into new structures using quasi-quoting and other mechanisms. One can provide these facilities quite uniformly without difficulty, and without explicit syntax. (This is even true for quasi-quoting, although there's usually some syntactic sugar to make it more usable.) The results will always "look like Lisp", even though they might behave rather differently: again, we limit the scope of what is dealt with grammatically and what is dealt with programmatically, and get results that are somewhat different to those we would get with the standard route to compiler construction.

This isn't to say that we should give up on grammars and go back to more primitive definitions of languages, but just to observe that grammars evolved to suit a particular purpose that needs to be continually checked for relevance. Most programmers find Java (for example) easier to read than Scheme, despite the latter's more straightforward and regular syntax: simplicity is in the eye of the beholder, not the parser. But a formal grammar may not be the best solution in situations where we want variable, loose and extensible syntax for whatever reason, so it's as well to be aware of the alternative that make the problem one of programming rather than of parsing.