Skip to main content

The next challenges for situation recognition

As a pervasive systems research community we're doing quite well at automatically identifying simple things happening in the world. What is the state of the art, and what are the next steps?

Pervasive computing is about letting computers see and respond to human activity. In healthcare applications, for example, this might involve monitoring an elderly person in their home and checking for signs of normality: doors opening, the fridge being accessed, the toilet flushing, the bedroom lights going on and off at the right times, and so on. A collection of simple sensors can provide the raw observational data, and we can monitor this stream of data for the "expected" behaviours. If we don't see them -- no movement for over two hours during daytime, for example -- then we can sound an alarm and alert a carer. Done correctly this sort of system can literally be a life-saver, but also makes all the difference for people with degenerative illnesses living in their own homes.

The science behind these systems is often referred to as activity and situation recognition, both of which are forms of context fusion. To deal with these in the wrong order: context fusion is the ability to take several streams of raw sensor (and other) data and use it to make inferences; activity recognition is the detection of simple actions in this data stream (lifting a cup, chopping with a knife); and situation recognition is the semantic interpretation of a high-level process (making tea, watching television, in a medical emergency). Having identified the situation we can then provide an appropriate behaviour for the system, which might involve changing the way the space is configured (dimming the lights, turning down the sound volume), providing information ("here's the recipe you chose for tonight") or taking some external action (calling for help). This sort of context-aware behaviour is the overall goal.

The state of the art in context fusion uses some sort of uncertain reasoning including machine learning and other techniques that are broadly in the domain of artificial intelligence. These are typically more complicated than the complex event processing techniques used in financial systems and the like, because they have to deal with significant noise in the data stream. (Ye, Dobson and McKeever. Situation recognition techniques in pervasive computing: a review. Pervasive and Mobile Computing. 2011.) The results are rather mixed, with a typical technique (a naive Bayesian classifier, for example) being able to identify some situations well and others far more poorly: there doesn't seem to be a uniformly "good" technique yet. Despite this we can now achieve 60-80% accuracy (by F-measure, a unified measure of the "goodness" of a classification technique) on simple activities and situations.

That sounds good, but the next steps are going to be far harder.

To see what the nest step is, consider that most of the systems explored have been evaluated under laboratory conditions. These allow fine control over the environment -- and that's precisely the problem. The next challenges for situation recognition come directly from the loss of control of what's being observed.

Let's break down what needs to happen. Firstly, we need to be able to describe situations in a way that lets us capture human processes. This is easy to do to another human but tricky to a computer: the precision with which we need to express computational tasks gets in the way.

For example we might describe the process of making lunch as retrieving the bread from the cupboard, the cheese and butter from the fridge, a plate from the rack, and then spreading the butter, cutting the cheese, and assembling the sandwich. That's a good enough description for a human, but most of the time isn't exactly what happens. One might retrieve the elements in a different order, or start making the sandwich (get the bread, spread the butter) only to remember that you forgot the filling, and therefore go back to get the cheese, then re-start assembling the sandwich, and so forth. The point is that this isn't programming: people don't do what you expect them to do, and there are so many variations to the basic process that they seem to defy capture -- although no human observer would have the slightest difficulty in classifying what they were seeing. A first challenge is therefore a way of expressing the real-world processes and situations we want to recognise in a way that's robust to the things people actually do.

(Incidentally, this way of thinking about situations shows that it's the dual of traditional workflow. In a workflow system you specify a process and force the human agents to comply; in a situation description the humans do what they do and the computers try to keep up.)

The second challenge is that, even when captured, situations don't occur in isolation. We might define a second situation to control what happens when the person answers the phone: quiet the TV and stereo, maybe. But this situation could be happening at the same time as the lunch-making situation and will inter-penetrate with it. There are dozens of possible interactions: one might pause lunch to deal with the phone call, or one might continue making the sandwich while chatting, or some other combination. Again, fine for a human observer. But a computer trying to make sense of these happenings only has a limited sensor-driven view, and has to try to associate events with interpretations without knowing what it's seeing ahead of time. The fact that many things can happen simultaneously enormously complicates the challenge of identifying what's going on robustly, damaging what is often already quite a tenuous process. We therefore need techniques for describing situation compositions and interactions on top of the basic descriptions of the processes themselves.

The third challenge is also one of interaction, but this time involving multiple people. One person might be making lunch whilst another watches television, then the phone rings and one of them answers, realises the call is for the other, and passes it over. So as well as interpenetration we now have multiple agents generating sensor events, perhaps without being able to determine exactly which person caused which event. (A motion sensor sees movement: it doesn't see who's moving, and the individuals may not be tagged in such a way that they can be identified or even differentiated between.) Real spaces involve multiple people, and this may place limits on the behaviours we can demonstrate. But at the very least we need to be able to describe processes involving multiple agents and to support simultaneous situations in the same or different populations.

So for me the next challenges of situation recognition boil down to how we describe what we're expecting to observe in a way that reflects noise, complexity and concurrency of real-world conditions. Once we have these we can explore and improve the techniques we use to map from sensor data (itself noisy and hard to program with) to identified situations, and thence to behaviour. In many ways this is a real-world version of the concurrency theories and process algebras that were developed to describe concurrent computing processes: process languages brought into the real world, perhaps. This is the approach we're taking in a European-funded research project, SAPERE, in which we're hoping to understand how to engineer smart, context-aware systems on large and flexible scales.

A very jolly cloud

On a whim yesterday I installed a new operating system on a old netbook. Looking at it makes me realise what we can do for the next wave of computing devices, even for things that are notionally "real" computers.

I've been hearing about Jolicloud for a while. It's been positioning itself as an operating system for devices that will be internet-connected most of the time: netbooks at the moment, moving to Android phones and tablets. It also claims to be ideal for recycling old machines, so I decided to install it on an Asus Eee 901 I had lying around.

The process of installation turns out to be ridiculously easy: download a CD image, download a USB-stick-creator utility, install the image on the stick and re-boot. Jolicloud runs off the stick while you try it out; you can keep it this way and leave the host machine untouched, or run the installer from inside Jolicloud to install it as the only OS or to dual-boot alongside whatever you've already got. This got my attention straight away: Jolicloud isn't demanding, and doesn't require you to do things in a particular way or order.

Once running (after a very fast boot sequence) Jolicloud presents a single-window start-up screen with a selection of apps to select. A lot of the usual suspects are there -- a web browser (Google Chrome), Facebook, Twitter, Gmail, a weather application and so forth -- and there's an app repository from which one can download many more. Sound familiar? It even looks a little like an IPhone or Android smartphone user interface.

I dug under the hood a little. The first thing to notice is that the interface is clearly designed for touchscreens. Everything happens with a single click, the elements are well-spaced and there's generally a "back" button whenever you go into a particular mode. It bears more than a passing resemblance to the IPhone and Android (although there are no gestures or multitouch, perhaps as a concession to the variety of hardware targets).

The second thing to realise is that Jolicloud is really just a skin onto Ubuntu Linux. Underneath there's a full Linux installation running X Windows, and so it can run any Linux application you care to install, as long as you're prepared to drop out of the Jolicloud look-and-feel into a more traditional set-up. You can download Wine, the Windows emulator, and run Windows binaries if you wish. (I used this to get a Kindle reader installed.)

If this was all there was to Jolicloud I wouldn't be excited. What sets it apart is as suggested by the "cloud" part of the name. Jolicloud isn't intended as an operating system for single devices. Instead it's intended that everything a user does lives on the web ("in the cloud," in the more recent parlance -- not sure that adds anything...) and is simply accessed from one or more devices running Jolicloud. The start-up screen is actually a browser window (Chrome again) using HTML5 and JavaScript to provide a complete interface. The contents of the start-up screen are mainly hyperlinks to web services: the Facebook "app" just takes you to the Facebook web page, with a few small amendments done browser-side. A small number are links to local applications; most are little more than hyperlinks, or small wrappers around web pages.

The significance of this for users is the slickness it can offer. Installing software is essentially trivial, since most of the "apps" are actually just hyperlinks. That significantly lessens the opportunity for software clashes and installation problems, and the core set of services can be synchronised over the air. Of course you can install locally -- I installed Firefox -- but it's more awkward and involves dropping-down a level into the Linux GUI. The tendency will be to migrate services onto web sites and access them through a browser. The Jolicloud app repository encourages this trend: why do things the hard way when you can use a web-based version of a similar service? For data, Jolicloud comes complete with DropBox to store files "in the cloud" as well.

Actually it's even better than that. The start screen is synced to a Jolicloud account, which means that if you have several Jolicloud devices your data and apps will always be the same, without any action on your part: all your devices will stay synchronised because there's not really all that much local data to synchronise. If you buy a new machine, everything will migrate automatically. In fact you can run your Jolicloud desktop from any machine running the latest Firefox, Internet Explorer or Chrome browser, because it's all just HTML and JavaScript: web standards that can run inside any appropriate browser environment. (It greys-out any locally-installed apps.)

I can see this sort of approach being a hit with many people -- the same people to whom the IPad appeals, I would guess, who mainly consume media and interact with web sites, don't ever really need local applications, and get frustrated when they can't share data that's on their local disc somewhere. Having a pretty standard OS distro underneath is an added benefit in terms of stability and flexibility. The simplicity is really appealing, and would be a major simplification for elderly people for example, for whom the management of a standard Windows-based PC is a bit of a challenge. (The management of a standard Windows-based PC is completely beyond me, actually, as a Mac and Unix person, but that's another story.)

But the other thing that Jolicloud demonstrates is both the strength and weaknesses of the IPhone model. The strengths are clear the moment you see the interface: simple, clear, finger-sized and appropriate for touch interaction. The app store model (albeit without payment for Jolicloud) is a major simplification over traditional approaches to maanaging software, and it's now clear that it works on a "real" computer just as much as on a smartphone. Jolicloud can install Skype, for example, which is a "normal" application although typically viewed by users as some sort of web service. The abstraction of "everything in the cloud" works extremely well. (A more Jolicloud-like skin would make Skype even more attractive, but of course it's not open-source so that's difficult.)

Jolicloud also demonstrates that one can have such a model without closing-down the software environment. You don't have to vet all the applications, or store them centrally, or enforce particular usage and payment models: all those things are commercial choices, not technical ones, and close-down the experience of owning the device they manage. One can perhaps make a case for this for smartphones, but not so much for netbooks, laptops and other computers.

I'm not ready to give up my Macbook Air as my main mobile device, but then again I do more on the move than most people. As just a portable internet connection with a bigger screen than a smartphone it'll be hard to beat. There are other contenders in the space, of course, but they're either not really available yet (Google Chrome OS) or limited to particular platforms (WebOS). I'd certainly recommend a look at Jolicloud for anyone in the market for such a system, and that includes a lot of people who might never otherwise consider getting off Windows on a laptop: I'm already working on my mother-in-law.

Disloyalty cards

I seem to be carrying a lot of loyalty cards at the moment. Given my complete lack of loyalty to any particular brand, is there some way to lose most of the cards and still get the benefits?

Most large stores have loyalty cards these days: swipe them at the till and receive some small but non-zero reward at a later date. There are all sorts of variations: Tesco UK has a very generous rewards programme; Starbucks' card is also a debit card that you can load-up with cash to pay for coffee; Costa has a more traditional "buy ten, get one free" approach; Waterstone's collects points to put towards later book purchases.. (You can already see where a lot of my disposable income goes...)

What the store gets from this, of course, is market intelligence at the level of the individual and group. Tesco (to use one example) can use my loyalty card to match purchases to me, in a way that's otherwise impossible. Using this information they can tailor offers they make to me: give me cash vouchers to spend on anything, but also targeted vouchers for individual products that I don't normally buy, but which my purchasing history says I might, if offered. The supermarket can also group the purchases of people in a particular neighbourhood and use it to stock local stores in ways that appeal to the actual shoppers who frequent the place: if you doubt this, go to a downtown Tesco and then a suburban one, and compare the variation in stock.

The fact that loyalty cards are so common tells you how valuable stores find this kind of market intelligence, and it's purchased from customers for a very small price. We reveal an enormous amount about our lifestyles and activities from our everyday purchases. Once you have a really huge amount of data, you can extract information from it in ways that are by no means obvious. You can also be more speculative in how you process the data, since the penalties for being wrong are so low. A traditional pervasive computing system is real-time and has a high penalty for inaccuracy; a loyalty card operates on a slower timescale and only has opportunity costs for incorrect recommendations. Thought about from this perspective, I could even see ways of testing hypotheses you formed about individuals, and using this to refine the situation recognition algorithms: if their habits suggest they have a baby, offer them a really good deal for baby products and use their future actions to confirm or refute the hypothesis.

These approaches work both in the real world and online, of course. Amazon gets all your profile information without offering a loyalty card, simply by the way e-commerce works. Tesco online links e-shopping to real-world shopping, and so can compare your long-term trend purchases against your "impulse" (or just forgetfulness) buys in the local shop, and correlate this against your home address.

None of this is necessarily bad, of course. I like Amazon profiling me: they introduce me to books I might not otherwise read, and that's an unmitigated good thing (for me). But it's important to realise that these things are happening, their good and bad effects.

If you care about your privacy, don't carry a loyalty card. On the other hand, if you value personalised service, carry a load of them and help them classify you.

It's also important to realise where the breaks in the coverage are. If Tesco knew I frequented Starbucks, for example, they might offer me a different kind of coffee to someone who doesn't. (Don't offer a coffee snob a deal on cheap instant.) There are lots of correlations between different areas of people's lives that could be exploited -- "exploited" in a nice way, at least in principle, but also profitably for the companies involved. What's needed is a way to tie together more aspects of individuals' lives, either as individuals or in bulk.

One way to do this might be to get rid of per-store loyalty cards and replace them with a single profiling card: a "disloyalty" card. The disloyalty card acts as a hub for collecting details of what an individual buys where. The individual stores signing-up to the card offer points according to their normal practices, but the points can be spent at any store that participates in the disloyalty card scheme. For the consumer, this means that they collect points faster and can therefore spend them on more valuable goods and services.

What's in it for the stores. More market intelligence, across an individual's entire purchasing profile. One could do this individually, to allow stores to see what customers buy in other stores and cross-sell to them. Alternatively one could anonymise the data at an individual level but allow correlations across groups: everyone who lives in Morningside, or everyone who buys coffee early in the morning on a week-day at Haymarket station. If the disloyalty card operator acted as a trusted third party, individual stores could even cross-sell to individuals without ever knowing to whom they were making the offer: that way the stores still "own" their own customers, but can access more data about them en masse, online and offline, across a range of businesses.

I'm not necessarily suggesting this would be a good thing for the individual or indeed for the operation of the retail market as a whole -- it provides enormous opportunities for profiling and severely and increasingly disenfranchises people who choose to sit outside the scheme -- but it's a commercial possibility that should be considered both by commercial interests and by regulators.

Like many ideas, there's precedent. Back in the 1970's we all used to collect Green Shield stamps, which were given out by a range of shops and could be redeemed collectively at special stores. (My grandfather, who was particularly methodical and assiduous in all he did, was amazingly effective at collecting his way to what he wanted in the Green Shield stores.) This scheme has the group-collecting aspect of disloyalty cards without the market intelligence aspect, which brings the whole scheme into the 21st century and gives is a measurable monetary value for all concerned.

Hiring academics

Hiring anyone is always a risk, and hiring a new academic especially so given the nature of our contracts. So what's the best way to approach it?

What's brought this to mind is a recent discussion about the need to -- or indeed wisdom of -- interviewing new staff. The argument against interviewing is actually not all that uncommon. People tend to like and identify with -- and therefore hire -- people like themselves, and this can harm the diversity of a School when everyone shares a similar mindset. Another version of the same argument (that was applied in a place I used to work) says that you appoint the person who interviews best on the day, having shortlisted the five or so best CVs submitted regardless of research area.

I can't say I buy either version. In fact I would go the other way: you need a recruitment strategy that decides what strengths you want in your School -- whether that's new skills or building-up existing areas -- and then interview those people with the right academic fit and quality with a view to deciding who'll fit in with the School's culture and intentions.

My reasons for this are fairly simple. The best person academically isn't necessarily the best appointment. The argument that you employ the best researchers, in line with the need to generate as much world-class research as possible, is belied by the need to also provide great teaching (to attract the best students) and to engage in the impact that research has in the wider world. The idea that one would employ the best person on the day regardless of area strikes me as a non-strategy that would lead to fragmentation of expertise and an inability to collaborate internally. (It actually does the academic themselves no favours if they end up hired into a School with no-one to collaborate with.) Interviewing weeds-out the unsociable (and indeed the asocial) and lets one assess people on their personal as well as academic qualities. It's important to remember that academics typically have some form of tenure -- or at the very least are hard to fire -- and so one can't underestimate the damage that hiring a twisted nay-sayer can do.

In case this isn't convincing, let's look at it another way. Suppose we recruit a new full professor. Suppose that that they're about 45, and so have around 20 years to retirement. Assume further that they stay for that entire time and don't retire early or leave for other reasons. The average pre-tax salary for a full professor in the UK is around £70,000. So the direct salary cost of the appointment is of the order of £1,500,000. After that, the individual will retire and draw (for example) 1/3rd of salary for another 15 years. (Although this is paid for from an externally-administered pension fund, we can assume for our purposes that the costs of this fund come at least partially from university funds.) So the direct cost of that appointment doesn't leave much change out of £1,800,000.

(And that's just the direct costs, of course. There are also the opportunity costs of employing the wrong person, in terms of grants not won, students not motivated, reputations damaged and so forth. I have no idea how to calculate these, but I'm willing to believe they're of a similar order to the direct costs.)

So in appointing this individual, the interview panel is making a decision whose value to the university is of the order of £2,000,000, and probably substantially more. How much time and care would you take before you spent that much?

My experience has been mixed. A couple of places I interviewed had candidates in front of the interview committee (of four people, in one case) for a fifteen-minute presentation and a one-hour interview: one hundred minutes of face time to make what was quite literally a million-pound decision. By contrast I was in St Andrews for three days and met what felt like half the university including staff, teaching fellows, students, postdocs, administrators and others.

I think the idea that a CV is all that matters is based on the fallacy that the future will necessarily be like the past. I'm a contrarian in these things: if I interview someone for a job I don't care what they've done in the past, except to the extent that it's a guide to what they're going to do in the future. What you're trying to decide in making a hiring decision is someone's future value. Taken to its logical conclusion, what you ideally want to do is to identify people early who are going to be professors early -- and hire them, now! What you shouldn't do is only consider people with great pasts, because you get little or no value from that if it isn't carried forward. You want to catch the good guys early, and then you get all the value of the work put on their CVs going forward. You also get to benefit from the motivational power of promotion, which for many people will spur them to prove themselves.

Clearly there's a degree of unacceptable risk inherent in this, which we basically mitigate by employing people as junior academics. But this only works for the young guns if the institution's internal promotion scheme is efficient and will reward people quickly for their successes. Otherwise the young guns will look elsewhere, for an institution playing the long game and willing to take a chance on them -- and will do so with a better CV, that you've helped them build by hiring them in the first place. In the current climate institutions can't afford this, so optimising hiring and promotions is becoming increasingly critical for a university's continued success.

Why I research

I came across George Orwell's 1946 essay Why I write. Some of his reasons for writing resonate with some of my reasons for doing research.

I wasn't expecting that. Research involves writing, of course, and reading too -- enormous amounts of both -- but it's not the same kind of writing as a writer writes. While it's good to have a clear style in science, it's the clarity of the science and not the turn of phrase that's the most prized element of scientific writing.

Orwell, writing in 1946 (which is post Animal farm, incidentally), sets out the forces that led him to become a writer. Specifically, he identifies "four great motives for writing" that will drive each writer in different proportion depending on who he is, when and what he's writing about. Reading these, it struck me how close they are to the reasons I (and I suspect others) have for doing research:

Sheer egoism. That's one any academic has to agree with. A desire to be thought clever, talked about, and be remembered. For some, in some fields, there's a seduction towards popular commentary. For others it's a more academic draw, to be the centrepoint of conferences or to be head-hunted into bigger and better (and better-paid) jobs. As long as the pleasure of recognition doesn't get in the way of the science I don't think it does any harm, although we always need to keep in mind that it's really all about the research and not about the researcher -- a little memento mori for the honest scientist.

Aesthetic enthusiasm. It might surprise some people to hear about aesthetics applied to science, but it's certainly an aspect of the experience: the pleasure in thoughts well-set-down, of an elegant and complete explanation of something. Especially in mathematics there's a clear and well-recognised link between simplicity and the perception of beauty, a feeling that the right approach is the one that captures all the essence of a problem as simply as possible. It's surprising how often such simplicity works, and how often it leads directly to unexpected complexity and interactions.

Historical impulse. Orwell uses this in the sense of "a desire to see things as they are, to find out true facts and store them up for the use of posterity." In sciences I think one can also see in this a desire to bring together strands of past work (your own and others'), and again to make a clean and elegant synthesis of whatever field one works in. I think it also encompasses the need we have to share science, to have papers in durable venues like journals as well as in live conference events.

Political purpose. The desire to push the world in a particular direction. Scientists are often accused of being remote from the real world, although a lot of what we do is increasingly influencing public policy. This element of writing also led Orwell to the most famous line of the whole essay: "The opinion that art should have nothing to do with politics is itself a political attitude." Few senior scientists would disagree nowadays.

It's surprising to me that a writer and a scientist could agree on their driving purposes to this extent. I suspect it's simply because of the need for creativity that underlies both spheres, and the fact that both are essentially self-motivated and self-driven. You become a writer because it's who you are, not because of what you'll get out of it, although you also obviously hope that the rewards will flow somehow; similarly for science.

What I don't have, of course -- and I'm grateful for it -- is the equivalent of the trauma Orwell suffered in the Spanish Civil War, which shaped his subsequent writing and give him a lifelong cause against totalitarianism. It certainly makes me want to read Homage to Catalonia again, though, and I'll read it with a new view on how it came about.