Skip to main content

Posts about programming (old posts, page 2)

The next challenges for situation recognition

As a pervasive systems research community we're doing quite well at automatically identifying simple things happening in the world. What is the state of the art, and what are the next steps?

Pervasive computing is about letting computers see and respond to human activity. In healthcare applications, for example, this might involve monitoring an elderly person in their home and checking for signs of normality: doors opening, the fridge being accessed, the toilet flushing, the bedroom lights going on and off at the right times, and so on. A collection of simple sensors can provide the raw observational data, and we can monitor this stream of data for the "expected" behaviours. If we don't see them -- no movement for over two hours during daytime, for example -- then we can sound an alarm and alert a carer. Done correctly this sort of system can literally be a life-saver, but also makes all the difference for people with degenerative illnesses living in their own homes.

The science behind these systems is often referred to as activity and situation recognition, both of which are forms of context fusion. To deal with these in the wrong order: context fusion is the ability to take several streams of raw sensor (and other) data and use it to make inferences; activity recognition is the detection of simple actions in this data stream (lifting a cup, chopping with a knife); and situation recognition is the semantic interpretation of a high-level process (making tea, watching television, in a medical emergency). Having identified the situation we can then provide an appropriate behaviour for the system, which might involve changing the way the space is configured (dimming the lights, turning down the sound volume), providing information ("here's the recipe you chose for tonight") or taking some external action (calling for help). This sort of context-aware behaviour is the overall goal.

The state of the art in context fusion uses some sort of uncertain reasoning including machine learning and other techniques that are broadly in the domain of artificial intelligence. These are typically more complicated than the complex event processing techniques used in financial systems and the like, because they have to deal with significant noise in the data stream. (Ye, Dobson and McKeever. Situation recognition techniques in pervasive computing: a review. Pervasive and Mobile Computing. 2011.) The results are rather mixed, with a typical technique (a naive Bayesian classifier, for example) being able to identify some situations well and others far more poorly: there doesn't seem to be a uniformly "good" technique yet. Despite this we can now achieve 60-80% accuracy (by F-measure, a unified measure of the "goodness" of a classification technique) on simple activities and situations.

That sounds good, but the next steps are going to be far harder.

To see what the nest step is, consider that most of the systems explored have been evaluated under laboratory conditions. These allow fine control over the environment -- and that's precisely the problem. The next challenges for situation recognition come directly from the loss of control of what's being observed.

Let's break down what needs to happen. Firstly, we need to be able to describe situations in a way that lets us capture human processes. This is easy to do to another human but tricky to a computer: the precision with which we need to express computational tasks gets in the way.

For example we might describe the process of making lunch as retrieving the bread from the cupboard, the cheese and butter from the fridge, a plate from the rack, and then spreading the butter, cutting the cheese, and assembling the sandwich. That's a good enough description for a human, but most of the time isn't exactly what happens. One might retrieve the elements in a different order, or start making the sandwich (get the bread, spread the butter) only to remember that you forgot the filling, and therefore go back to get the cheese, then re-start assembling the sandwich, and so forth. The point is that this isn't programming: people don't do what you expect them to do, and there are so many variations to the basic process that they seem to defy capture -- although no human observer would have the slightest difficulty in classifying what they were seeing. A first challenge is therefore a way of expressing the real-world processes and situations we want to recognise in a way that's robust to the things people actually do.

(Incidentally, this way of thinking about situations shows that it's the dual of traditional workflow. In a workflow system you specify a process and force the human agents to comply; in a situation description the humans do what they do and the computers try to keep up.)

The second challenge is that, even when captured, situations don't occur in isolation. We might define a second situation to control what happens when the person answers the phone: quiet the TV and stereo, maybe. But this situation could be happening at the same time as the lunch-making situation and will inter-penetrate with it. There are dozens of possible interactions: one might pause lunch to deal with the phone call, or one might continue making the sandwich while chatting, or some other combination. Again, fine for a human observer. But a computer trying to make sense of these happenings only has a limited sensor-driven view, and has to try to associate events with interpretations without knowing what it's seeing ahead of time. The fact that many things can happen simultaneously enormously complicates the challenge of identifying what's going on robustly, damaging what is often already quite a tenuous process. We therefore need techniques for describing situation compositions and interactions on top of the basic descriptions of the processes themselves.

The third challenge is also one of interaction, but this time involving multiple people. One person might be making lunch whilst another watches television, then the phone rings and one of them answers, realises the call is for the other, and passes it over. So as well as interpenetration we now have multiple agents generating sensor events, perhaps without being able to determine exactly which person caused which event. (A motion sensor sees movement: it doesn't see who's moving, and the individuals may not be tagged in such a way that they can be identified or even differentiated between.) Real spaces involve multiple people, and this may place limits on the behaviours we can demonstrate. But at the very least we need to be able to describe processes involving multiple agents and to support simultaneous situations in the same or different populations.

So for me the next challenges of situation recognition boil down to how we describe what we're expecting to observe in a way that reflects noise, complexity and concurrency of real-world conditions. Once we have these we can explore and improve the techniques we use to map from sensor data (itself noisy and hard to program with) to identified situations, and thence to behaviour. In many ways this is a real-world version of the concurrency theories and process algebras that were developed to describe concurrent computing processes: process languages brought into the real world, perhaps. This is the approach we're taking in a European-funded research project, SAPERE, in which we're hoping to understand how to engineer smart, context-aware systems on large and flexible scales.

Sensor and sense-ability

I'm talking at the BCS Edinburgh branch tonight about sensing and sensor-driven systems.

It has been an old maxim in computing that incorrect inputs can acceptably give rise to unacceptable outputs: "garbage in, garbage out".

This is ceasing to be true, and many classes of systems must behave predictably even in the face of inputs containing substantial garbage -- although researchers delicately use terms like "imprecise" or "unstructured" instead of "garbage".

In this talk we discuss some approaches to managing the problem of imprecise, inaccurate, untimely and partial inputs in the context of pervasive and sensor-driven systems, and suggest that we need to re-think radically the way we build software and represent decision-making in these environments.

Details of the venue are available here.

How to publish an unpopular book?

I've been thinking about writing a book. It won't be a popular success -- trust me -- but that raises the question of how I should publish it.

I've never written a book, although I've written a lot of papers and edited a couple of conference proceedings and other collections: writing is one of the things academics do for a living. But I've been thinking for a while about writing a book on how to build a programming language. This isn't something that JK Rowling (my neighbour in Morningside) needs to worry will eat into her royalties, obviously, but it's something that'd be of interest to a certain group of people. I've been teaching programming, compilers, language principles and the like for several years, and have sometimes shown classes how to build interpreters and compilers from the ground up. I'd like to share this with a wider audience, and show how the tools and techniques of languages can be used to make a whole class of problems easier and more fun to solve. There's something very liberating and exciting (to me, anyway) about understanding the tools of programming in their most raw, and of being able to change them to suit particular circumstances. It also brings a new perspective to things one might encounter in particular languages, that can encourage experimentation and invention and the re-purposing of tools to new domains.

It's not the writing that's the problem: the problem is the publishing. Clearly, no matter how excited I get about these things, it's going to be a pretty minority interest: not even most computer scientists write compilers. So it's hardly going to be a best-seller. But that then raises an interesting question. Traditional book publishing is about getting visibility and distribution for your work, with a view to maximising circulation, impact and royalties. If there's no money to be had, and the target audience is by definition computer- and internet-aware, are there better ways of getting the same (or better) visibility, distribution and impact, and reaching the audience more effectively than one can by traditional means?

What, in the 21st century, is the most effective way to publish an unpopular book?

In one way the internet answers this question in an obvious way: put a file on a web server. But that still leaves the visibility and impact parts to be solved -- and there are half-a-dozen ways to make the text available on a web server, too. We can split the problem between these two parts, though: how to write and represent the text, and how to let people know it's there.

Distribution and format

Web site. A lot of books have associated web sites, for errata and additional material, sometimes freely available and sometimes behind a paywall. Clearly one could put a whole book up on a site, as well as any associated software, with a suitable licence to allow for downloading and whatever copying seems permissible. I use this approach for this site: all the content is under a Creative Commons licence that allows non-commercial sharing with attribution, and allows re-mixing as long as the derived work is also shared under the same or similar terms.

Most web sites require that you be on-line to read them, although that's not necessarily the case for systems like TiddlyWiki that download in one file. And one can get all the benefits of non-linear browsing and re-purposing by using proper hypertext as opposed to PDF.

E-book. E-books have a lot to recommend them, especially their portability and download-ability. PDF is a popular format, but EPUB is probably a better choice: you get reflowing, hyperlinking and portability to small devices with no effort, in a way that's hard for a PDF.

Of course these formats aren't mutually exclusive, and one could easily come up with a writing system that can generate PDF, EPUB and indeed HTML from the same sources.

Blog. The above are still fairly traditional approaches, varying in terms of delivery medium. What about blogging a book, allowing it to evolve over time?

I can think of one immediate disadvantage, which would be the danger of a lack of flow and a disjointedness that comes from not wrapping-up a work as a single entity. But of course there are some significant advantages: there's no reason one couldn't write large chunks of text and them blog them over time, and refine the text using comments before generating an e-book or re-linking into a more conventional web site.

Wiki/group blog. If we accept the no money/lots of copying philosophy, then perhaps there's no reason to be precious about authorship either. A group blog or a wiki that encourages participation and updating might make sense: a sort of Wikipedia for programming languages, in which chapters can be commented on and edited by a community (if one forms). This might generate a work that's more complete than one I could wrote myself, if one got contributions from the appropriate, knowledgeable people. It could also degenerate into a farce without clear editing guidelines and proper curation: essentially the problems of a normal blog, writ large.

Wikis, and especially Wikipedia, often get trashed by academics. This isn't an opinion I completely share. At their best, wikis harness the best people with an interest in a subject. Their content needs protection from the stupid, vain, deluded, vicious and malicious, but none of that outweighs the benefits of having potentially every expert in the world contributing. A traditional encyclopaedia is not necessarily more reliable -- look up "anthropology" in an early Encyclopaedia Britannica to see how fallible expert opinion is to modern eyes -- and with care there's no reason why a wiki need be less exacting than a more traditional medium. (Encyclopaediae aren't a good way to learn a subject, for which you really need a structured and knowledgeable guide -- but that's another story.)

Visibility

Visibility subsumes impact, in a sense: if something is very visible and easily-found, then it's popularity is a direct measure of its significance in its community of interest. And if something is highly visible and still unpopular: well, that's just something to live with. We can split visibility between pull and push: people finding what they're looking for versus being told that there's something they might be interested in.

SEO. Search engine optimisation has evolved from being a valuable skill, through a commodity, to an arms race in which sites try to get search engines to notice them and search engines try not to be manipulated away from whatever they regard as their core metric for importance. (PageRank in the case of Google.) Most content managers have SEO packages built-in or available that can help.

Blogs. There are a number of great programming language blogs out there, through which one could solicit help and readers. If the internet does anything, it's demonstrate that any small community or interest is globally large -- or at least large enough to keep most people happy. Even language hackers.

Software. For a book about writing languages, I suspect the most effective advertisement is the software that one can develop with the techniques described, or the tools one could use to follow them. The sincerest recommendation is for the software to be used, found useful, and improved by someone else, who's then willing to share their experiences back with the net.

Having written all of the above, I'm still not sure where it leaves me. I'd welcome any comments or descriptions of experiences before I start putting hand to keyboard in the wrong way. Who'd have thought it's be so complicated? -- although I must say that having these sorts of choices is in itself a major draw, and a great indication of how the web's changing the world.

Monads: a language designer's perspective

Monads are one of the hottest topics in functional programming, and arguably simplify the construction of a whole class of systems. Which makes it surprising that they're so opaque and hard to understand to people who's main experience is in imperative or object-oriented languages.

There are a lot of explanations of, and tutorials on, monads, but most of them seem to take one of two perspectives: either start with a concrete example, usually in I/O handling, and work back, or start from the abstract mathematical formulation and work forwards. This sounds reasonable, but apparently neither works well in practice -- at least, judging from the comments one receives from  intelligent and able programmers who happen not to have an extensive functional programming or abstract mathematical background. Such a core concept shouldn't be hard to explain, though, so I thought I'd try a different tack: monads from the perspective of language design.

In Pascal, C or Java, statements are separated (or terminated) by semicolons. This is usually regarded as a piece of syntax, but let's look at it slightly differently. Think of the semicolon as an operator that takes two program fragments and combines them together to form a bigger fragment. For example:

int x = 4; int y = x * 3; printf("%d", y);

We have three program fragments. The semicolon operator at the end of the first line takes the fragment on its left-hand side and combines it with the fragment on its right-hand side. Essentially the semicolon defines how the RHS is affected by the code on the LHS: in this case the RHS code is evaluated in an environment that includes a binding of variable x, effectively resolving what is otherwise a free variable. Similarly, the semicolon at the end of the second line causes the third line to be evaluated in an environment that include y. The meaning of the semicolon is hard-wired into the language (C, in this case) and defines how code fragments are sequenced and their effects propagated.

Now from this perspective, a monad is a programmable semicolon. A monad allows the application programmer, rather than the language designer, to determine how a sequence of code is put together, and how one fragment affects those that come later.

Let's revert to Haskell. In a slightly simplified form, a monad is a type class with the following signature:

class Monad m where return :: a -> m a (>>=) :: m a -> (a -> m b) -> m b

So a monad is a constructed type wrapping-up some underlying element type that defines two functions, return and (>>=). The first function injects an element of the element type into the monadic type. The second takes an element of the monadic type and a function that maps an element that monad's element type to some other monadic type, and returns an element of this second monadic type.

The simplest example of a monad is Haskell's Maybe type, which represents either a value of some underlying element type or the absence of a value:

data Maybe a = Just a | Nothing

Maybe is an instance of Monad, simply by virtue of defining the two functions that the type class needs:

instance Monad Maybe where return a = Just a Just a >>= f = f a Nothing >>= _ = Nothing

return injects an element of a into an element of Maybe a. (>>=) takes an element of Maybe a and a function from a to Maybe b. If the element of Maybe a it's passed is of the form Just a, it applies the function to the element value a. If, however, the element is Nothing, it returns Nothing without evaluating the function.

It's hard to see what this type has to do with sequencing, but bear with me. Haskell provides a do construction which gives rise to code like the following:

do v <- if b == 0 then Nothing else Just (a / b) return 26 / v

Intuitively this looks like a sequence of code fragments, so we might infer that the conditional executes first and binds a value to v, and then the next line computes with that value -- which is in fact what happens, but with a twist. The way in which the fragments relate is not pre-defined by Haskell. Instead, the relationship between the fragments is determined by the values of which monad the fragments manipulate (usually expressed as which monad the code executes in). The do block is just syntactic sugar for a stylised use of the two monad functions. The example above expands to:

(if b == 0 then Nothing else Just (a / b)) >>= (\v -> return (26 / v))

So the do block is syntax that expands into user-defined code, depending on the monad that the expressions within it use. In this case, we execute the first expression and then compose it with the function on the right-hand side of the (>>=) operator. The definition says that, if the left-hand side value is Just a, the result is that we call the RHS passing the element value; if the LHS is Nothing, we return Nothing immediately. The result is that, if any code fragment in the computation returns Nothing, then the entire computation returns Nothing, since all subsequent compositions will immediately short-circuit: the Maybe type acts like a simple exception that escapes from the computation immediately Nothing is encountered. So the monad structure introduces what's normally regarded as a control construct, entirely within the language. It's fairly easy to see that we could provide "real" exceptions by hanging an error code off the failure value. It's also fairly easy to see that the monad sequences the code fragments and aborts when one of the "fails". In C we can think of the same function being provided by the semicolon "operator", but with the crucial difference that it is the language, and not the programmer, that decides what happens, one and for all. Monads reify the control of sequencing into the language.

To see how this can be made more general, let's think about another monad: the list type constructor. Again, to make lists into monads we need to define return and (>>=) with appropriate types. The obvious injection is to turn a singleton into a list:

instance Monad [] where return a = [a]

The definition of (>>=) is slightly more interesting: which function of type [a] -> (a -> [b]) -> [b] is appropriate? One could choose to select an element of the [a] list at random and apply the function to it, giving a list [b] -- a sort of non-deterministic application of a function to a set of possible arguments. (Actually this might be interesting in the context of programming with uncertainty, but that's another topic.) Another definition -- and the one that Haskell actually chooses -- is to apply the function to all the elements of [a], taking each a to a list [b], and then concatenating the resulting lists together to form one big list:

l >>= f = concat (map f l)

What happens to the code fragments in a do block now? The monad threads them together using the two basic functions. So if we have code such as:

do x <- [1..10] y <- [20..30] return (x, y)

What happens? The first and second fragments clearly define lists, but what about the third, which seems to define a pair? To see what happens, we need to consider all the fragments together. Remember, each fragment is combined with the next by applying concat (map f l). If we expand this out, we get:

concat (map (\x -> concat (map (\y -> return (x, y)) [20..30])) [1..10])

So to summarise, Haskell provides a do block syntax that expands to a nested sequence of monadic function calls. The actual functions used depend on the monadic type in the do block, so the programmer can define how the code fragments relate to one another. Common monads include some simple types, but also I/O operations and state bindings, allowing Haskell to perform operations that are typically regarded as imperative without losing its laziness. The Haskell tutorial explains the I/O syntax.

What can we say about monads from the perspective of language design? Essentially they reify sequencing, in a functional style. They only work as seamlessly as they do because of Haskell's flexible type system (allowing the definition of new monads), and also because of the do syntax: without the syntactic sugar, most monadic code is incomprehensible. The real power is that they allow some very complex functionality to be abstracted into functions and re-used. Consider the Maybe code we used earlier: without the "escape" provided by the Maybe monad, we'd have to guard each statement with a conditional to make sure there wasn't a Nothing returned at any point. This quickly gets tiresome and error-prone: the monad encapsulates and enforces the desired behaviour. When you realise that one can also compose monads using monad transformers, layering monadic behaviours on top of each other (albeit with some contortions to keep the type system happy), it becomes clear that this is a very powerful capability.

I think one can also easily identify a few drawbacks, though. One that immediately springs to mind is that monads reify one construction, of the many that one might choose. A more general meta-language, like the use of meta-objects protocols or aspects, or structured language and compiler extensions, would allow even more flexibility. A second -- perhaps with wider impact -- is that one has to be intimately familiar with the monad being used before one has the slightest idea what a piece of code will do. The list example above is not obviously a list comprehension, until one recognises the "idiom" of the list monad. Thirdly, the choice of monadic function definitions isn't often canonical, so there can be a certain arbitrariness to the behaviour. It'd be interesting to consider generalisations of monads and language constructs to address these issues, but for the meantime one can use them to abstract a whole range of functionality in an interesting way. Good luck!

Contextual processes

Context-aware systems are intended to follow and augment user-led, real-world processes. These differ somewhat from traditional workflow processes, but share some characteristics. Might the techniques used to implement business processes via web service orchestration fit into the context-aware landscape too?

These ideas arose as a result of discussions at the PMMPS workshop at PERVASIVE 2010 in Helsinki. In particular, I was thinking about comments Aaron Quigley made in his keynote about the need to build platforms and development environments if we're to avoid forever building just demos. The separation of function from process seems to be working in the web world: might it work in the pervasive world too?

In building a pervasive system we need to integrate several components:

  1. A collection of sensors that allow us to observe users in the real world
  2. A user or situation model describing what the users are supposed to be doing, in terms of the possible observations we might make and inferences we might draw
  3. A context model that brings together sensor observations and situations, allowing us to infer the latter from a sequence of the former
  4. Some behaviour that's triggered depending on the situation we believe we're observing
Most current pervasive systems have quite simple versions of all these components. The number of sensors is often small -- sometimes only one or two, observing one user. The situation model is more properly an activity model in that it classifies a user's immediate current activity, independently of any other activity at another time. The context model encapsulates a mapping from sensors to activities, which then manifest themselves in a activating or deactivating a single behaviour. Despite their simplicity, such systems can perform a lot of useful tasks.

However, pervasive activities clearly live on a timeline: you leave home and then walk to work and then enter your office and then check your email, and so forth. One can treat these activities as independent, but that might lose continuity of behaviour, when what you want to do depends on the route by which you got to a particular situation. Alternatively we could treat the timeline as a process, and track the user's progress along it, in the manner of an office workflow.

Of course the problem is that users don't actually follow workflows like this -- or, rather, they tend to interleave actions, perform them in odd orders, leave bits out, drop one process and do another before picking-up the first (or not), and so on. So pervasive workflows aren't at all like "standard" office processes. They aren't discriminated from other workflows (and non-workflow activities) happening simultaneously in the same space, with the same people and resources involved. In some simple systems the workflow actually is "closed", for example computer theatre (Pinhanez, Mase and Bobick. Interval scripts: a design paradigm for story-based interactive systems., Proceedings of CHI'97. 1997.) -- but in most cases its "open". So the question becomes, how do we describe "loose" workflows in which there is a sequence of activities, each one of which reinforces our confidence in later ones, but which contain noise and extraneous activities that interfere with the inferencing?

There are several formalisms for describing sequences of activities. The one that underlies Pinhanez' work mentioned above is Allen algebra (Allen and Ferguson. Actions and events in interval temporal logic. Journal of Logic and Computation 4(5), pp.531--579. 1994.) which provides a notation for specifying how intervals of time relate: an interval a occurs strictly before another b, for example, which in turn contains wholly within it another interval c. It's easy to see how such a process provides a model for how events from the world should be observed: if we see that b has ended, we can infer that c has ended also because we know that c is contained within b, and so forth. We can do this if we don't -- or can't -- directly observe the end of c. However, this implies that we can specify the relationships between intervals precisely. If we have multiple possible relationships the inferencing power degrades rapidly.

Another way to look at things is to consider what "noise" means. In terms of the components we set out earlier, noise is the observation of events that don't relate to the process we're trying to observe. Suppose I'm trying to support a "going to work" process. If I'm walking to work and stop at a shop, for example, this doesn't interfere with my going to work -- it's "noise" in the sense of "something that happened that's non-contradictory of what we expected to see". On the other hand if, after leaving the shop, I go home again, that might be considered as "not noise", in the sense of "something that happened that contradicts the model we have of the process". As well as events that support a process, we also have events that contradict it, and events that provide no information.

Human-centred processes are therefore stochastic, and we need a stochastic process formalism. I'm not aware of any that really fit the bill: process algebras seem too rigid. Markov processes are probably the closest, but they're really designed to capture frequencies with which paths are taken rather than detours and the like. Moreover we need to enrich the event space so that observations support or refute hypotheses as to which process is being followed and where we are in it. This is rather richer than is normal, where events are purely confirmatory. In essence what we have is process as hypothesis in which we try to confirm that this process is indeed happening, and where we are in it, using the event stream.

It's notable that we can describe a process separately from the probabilities that constrain how it's likely to evolve, though. That suggests to me that we might need an approach like BPEL, where we separate the description of the process from the actions we take as a result, and also form the ways in which we move the process forward. In other words, we have a description of what it means to go to work, expressed separately from how we confirm that this is what's being observed in terms of sensors and events, and separated further from what happens as a result of this process being followed. That sounds  a lot easier than it is, because some events are confirmatory and some aren't. Furthermore we may have several processes that can be supported  by observations up to a point and then diverge: going to work and going shopping are pretty similar until I go into a shop, and/or until I leave the shop and don't go to work. How do we handle this? We could enforce common-prefix behaviour, but that breaks the separation between process and action. We could insist on "undo" actions for "failed", no-longer-supported-by-the-observations processes, which severely complicates programming and might lead to interactions between different failed processes. Clearly there's something missing from our understanding of how to structure more complex, temporally elongated behaviours that'll need significant work to get right.