Simon Dobson (old posts, page 11)

Papers are solicited for a workshop on data-intensive, distributed and dynamic e-science problems.

Workshop on D3Science - Call for Papers

(To be held with IEEE e-Science 2011) Monday, 5 December 2011 Stockholm, Sweden

This workshop is interested in data-intensive, distributed, and dynamic (D3) science. It will also focus on innovative approaches for scalability in the end-to-end real-time processing of scientific data. We refer to D3 applications as those are data-intensive, are fundamentally, or need to be, distributed, and need to support and respond to data that may be non-persistent and is dynamically generated. We are also looking to bring researchers together to look at holistic, rather than piecewise, approaches to the end-to-end processing and managing of scientific data.

There has been a lot of effort in managing and distributing tasks where computation is dominant. Such applications have after all, been historically the drivers of “grid” computing. There has been, however, relatively less effort on tasks where the computationalload is matched by the data load, or even dominated by the data load. For such tasks to be able to operate at scale, there are conceptually simple run-time tradeoffs that need to be made, such as determining whether to move data to compute versus moving compute to data, or possibly regenerating data on-the-fly. Due to fluctuating resource availability and capabilities, as well as insufficient prior information about application requirements, such decisions must be made at run-time. Furthermore, resource, connectivity and/or storage constraints may require the data to be manipulated in-transit so that it is “made-right” for the consumer. Currently it is very difficult to implement these dynamic decisions or the underlying mechanisms in a general-purpose and scalable fashion. Although the increasing volumes and complexity of data will make many problems data-dominated, the computational requirements will still be high. In practice, data-intensive applications will encompass data-driven applications. For example, many data-driven applications will involve computational activities triggered as a consequence of independently created data; thus it is imperative for an application to be able to respond to unplanned changes in data load or content. Therefore, understanding how to support dynamic computations is a fundamental, but currently missing element in data-intensive computing.

The D3Science workshop builds upon a 3 year research theme on Distributed Programming Abstractions (DPA, http://wiki.esi.ac.uk/Distributed_Programming_Abstractions), which has held a series of related workshops including but not limited to e-Science 2008, EuroPar 2008 and the CLADE series, and the ongoing 3DPAS (http://wiki.esi.ac.uk/3DPAS) research theme funded by the NSF and UK EPSRC, which is holding one workshop in June 2011: the 3DAPAS workshop (https://sites.google.com/site/3dapas/). The workshop is intended to lead to a funding proposal for transcontinental collaboration, with contributors as potential members of the collaboration, and as such, we are particularly interested is discussing both existing and future projects that are suitable for transcontinental collaboration.

Topics of interest include but are not limited to:

Case studies of development, deployment and execution of representative
D3 applications, particularly projects suitable for transcontinental collaboration
Programming systems, abstractions, and models for D3 applications
Discussion of the common, minimally complete, characteristics of D3 application
Major barriers to the development, deployment, and execution of D3 applications, and primary challenges of D3 applications at scale
Patterns that exist within D3 applications, and commonalities in the way such patterns are used
How programming models, abstraction and systems for data-intensive applications can be extended to support dynamic data applications
Tools, environments and programming support that exist to enable emerging distributed infrastructure to support the requirements of dynamic applications (including but not limited to streaming data and in-transit data analysis)
Data-intensive dynamic workflow and in-transit data manipulation
Adaptive/pervasive computing applications and systems
Abstractions and mechanisms for dynamic code deployment and “moving code to data”
Application drivers for end-to-end scientific data management
Runtime support for in-situ analysis
System support for high end workflows
Hybrid computing solutions for in-situ analysis
Technologies to enable multi-platform workflows

Submission instructions

Authors are invited to submit papers containing unpublished, original work (not under review elesewhere) of up to 8 pages of double column text using single spaced 10 point size on 8.5 x 11 inch pages, as per IEEE 8.5 x 11 manuscript guidelines. Templates are available: http://www.ieee.org/web/publications/pubservices/confpub/AuthorTools/conferenceTemplates.html Authors should submit a PDF or PostScript (level 2) file that will print on a PostScript printer. Papers conforming to the above guidelines can be submitted through the workshop’s paper submission system: http://www.easychair.org/conferences/?conf=d3science It is a requirement that at least one author of each accepted paper register and attend the conference.

Important dates

17 July 2011 - submission date
23 August 2011 - decisions announced
23 September 2011 - final versions of papers due to IEEE for proceedings

Organizers

Daniel S. Katz, University of Chicago & Argonne National Laboratory, USA
Neil Chue Hong, University of Edinburgh, UK
Shantenu Jha, Rutgers University & Louisiana State University, USA
Omer Rana, Cardiff University, UK

PC members

Gagan Aggarwal, Ohio State University, USA
Deb Agarwal, Lawrence Berkeley National Lab, USA
Gabrielle Allen, Lousiana State University, USA
Malcolm Atkinson, University of Edinburgh, UK
Adam Barker, University of St Andrews, UK
Paolo Besana, University of Edinburgh, UK
Jon Blower, University of Reading, UK
Yun-He Chen-Burger, University of Edinburgh, UK
Simon Dobson, University of St Andrews, UK
Gilles Fedak, INRIA, France
Cécile Germain, University Paris Sud, France
Keith R. Jackson, Lawrence Berkeley National Lab, USA
Manish Parashar, Rutgers, USA
Abani Patra, University of Buffalo, USA
Yacine Rezgui, Cardiff University, UK
Yogesh Simmhan, University of Southern California, USA
Domenico Talia, University of Calabria, Italy
Paul Watson, Newcastle University, UK
Jon Weissman, University of Minnesota, USA

This week’s piece of shameless self-promotion: a book chapter on how pervasive computing and social media can change long-term healthcare. My colleague Aaron Quigley and I were asked to contribute a chapter to a book put together by Jeremy Pitt as part of PerAda, the EU network of excellence in pervasive systems. We were asked to think about how pervasive computing and social media could change healthcare. This is something quite close to both our hearts — Aaron perhaps more so than me — as it’s one of the most dramatic examples of how pervasive computing can really make an impact on society. There are plenty of examples of projects that attempt to provide high-tech solutions to the issues of independent living— some of which we’ve been closely involved with. For this work, though, we suggest that one of the most cost-effective contributions that technology can make might actually be centred around social media. Isolation really is a killer, in a literal sense. A lot of research has indicated that social isolation is a massive risk factor in both physiological and psychological illnesses, and this is something that’s likely to get worse as populations age. Social media can help address this, especially in an age when older people have circles of older friends, and where these friends and family can be far more geographically dispersed than in former times. This isn’t to suggest that Twitter and Facebook are the cures of any social ills, but rather that the services they might evolve into could be of enormous utility for older people. Not only do they provide traffic between people, they can be mined to determine whether users’ activities are changing over time, identify situations that can be supported, and so provide unintrusive medical feedback — as well as opening-up massive issues of privacy and surveillance. While today’s older generation are perhaps not fully engaged with social media, future generations undoubtedly will be, and it’s something to be encouraged. Other authors — some of them leading names in the various aspects of pervasive systems — have contributed chapters about implicit interaction, privacy, trust, brain interfaces, power management, sustainability, and a range of other topics in accessible form. The book has a web site (of course), and is available for pre-order on Amazon. Thanks to Jeremy for putting this together: it’s been a great opportunity to think more broadly than we often get to do as research scientists, and see how our work might help make the world more liveable-in.

Smalltalk’s influence has declined of late, at least in part because of the “all or nothing” architecture of the most influential distribution. We’ve got to the stage that we could change that. Some programming languages achieve epic influence without necessarily achieving universal adoption. Lisp was phenomenally influential, but has remained in an AI niche; APL gave rise to notions of bulk processing and hidden parallelism without having much uptake outside finance. Smalltalk’s influence has been similar: it re-defined what it meant to be an interactive programming environment and laid the ground for many modern interface concepts, as well as giving a boost to object-oriented programming that prepared the way for C++ and Java. So why does no-one use it any more? Partly it’s because of its very interactivity. Smalltalk — and more especially Squeak, its most prominent modern implementation — are unusual in basing software around complete interactive images rather than collections of source files. This isn’t inherently poor — images start immediately and encourage direct-manipulation design and interfacing — but it’s radically unfamiliar to programmers used to source code and version control. (Squeak provides version-controlled change sets to collect code outside an image, but that’s still an unfamiliar structure.) But a more important issue is the all-or-nothing nature of Squeak in particular, and in fact of novel languages in general. If you want to use Squeak for a project, all the components really need to be in Squeak (although it can also use web services and other component techniques). If you want to integrate web pages, someone needs to write an HTML rendering component, HTTP back-end and the like; for the semantic web someone needs to write XML parsers, triple stores, reasoners and the like. You can’t just re-use the ones that’re already out there, at least not without violating the spirit of the system somewhat. That means it plays well at the periphery with other tools, but at its core need most of the services to be written again. This is a huge buy-in. Similarly Squeak provides a great user interface for direct manipulation — but only within its own Squeak window, rendered differently and separately from other windows on the screen. These aren’t issues inherent to Smalltalk — it’s perfectly possible to imagine a Smalltalk system that used “standard” rendering (and indeed they exist) — but the “feel” of the language is rather more isolated than is common for modern systems used to integrating C libraries and the like. At bottom it’s not necessarily all that different to Java’s integration with the host operating system, but the style is very much more towards separation in pursuit of uniformity and expressive power. This is a shame, because Smalltalk is in many ways a perfect development environment for novice programmers (and especially for children, who find it captivating) who are a vast source of programming innovation for small, focused services such as we find on the web ad on smartphones. So can we make Smalltalk more mainstream? Turn it into an attractive development platform for web services and mobile apps? Looking at some recent developments I think the answer is firmly yes — and without giving up on the interactivity that gives it its attraction. The key change is (unsurprisingly) the web, or more precisely the current crop of browsers that support Javascript, style sheets, SVG, dynamic HTML and the like. The browser has now arrived at a point at which it can provide a complete GUI — complete with windows, moving and animated elements and the like — in a standard, platform-independent and (most importantly) cloud-friendly way. What I have in mind is essentially implementing a VM for a graphical Smalltalk system, complete with interactive compiler and direct-manipulation editing, in Javascript within a browser. The “image” is then the underlying XML document and its style sheet, which can be downloaded, shared and manipulated directly. The primitives are written in Javascript, together with an engine to run Smalltalk code and objects. Objects are persisted by turning them into document elements and storing them in the DOM tree, which incidentally allows their look and feel to be customised quite simply. Crucially, it can also appeal to any current or emerging features that can appear in style sheets, the semantic web, Javascript or embedded components: it’s mainstream with respect to the other technologies being developed. Why use Smalltalk, and not Javascript directly? I simply think that the understanding we gained from Smalltalk’s simplicity of programming model and embrace of direct manipulation is too valuable to lose. That’s not to say that it doesn’t need to be re-imagined for the web world, though. In fact, Smalltalk’s simplicity and interactivity are ideally suited to the development of front-ends, components and mobile apps — if they play well with the other technologies those front-ends and apps need to use, and with a suitably low barrier to entry. It’s undoubtedly attractive to be able to combine local and remote components together as end-user programs, without the hassle of a traditional compile cycle, and then be able to share those mash-ups directly to the web to be shared (and, if desired, modified) by anyone. One of the things that’s always attracted me about Smalltalk (and Forth, and Scheme — and Javascript to a lesser extent) is that the code lives completely within the dominant data structure: indeed, the code is just data in most cases, and can be manipulated using data-structure operations. This is very different from the separation you get between code and data in most other languages, and gives you a huge amount of expressive power. Conversely, one of the thing that always fails to attract me about these same languages is their lack of any static typing and consequent dependence on testing. Perhaps these two aspects necessarily go hand in hand, although I can’t think of an obvious reason why that should be. I know purists will scream at this idea, but to me it seems to go along with ideas that Smalltalk’s co-inventor, Alan Kay, has expressed, especially with regard to the need to do away with closely-packaged applications and move towards a more fluid style of software:

The “no applications” idea first surfaced for me at PARC, when we realised that you really wanted to freely construct arbitrary combinations (and could do just that with (mostly media) objects). So, instead of going to a place that has specialised tools for doing just a few things, the idea was to be in an “open studio” and pull the resources you wanted to combine to you. This doesn’t mean to say that e.g. a text object shouldn’t be pretty capable — so it’s a mini app if you will — but that it and all the other objects that intermingle with each other should have very similar UIs and have their graphics aspect be essentially totally similar as far as the graphics system is concerned — and this goes also for user constructed objects. The media presentations I do in Squeak for my talks are good examples of the directions this should be taken for the future.

(Anyone who has seen one of Kay’s talks — as I did at the ECOOP conference in 2000 — can attest to how stunningly engaging they are.) To which I would add that it’s equally important today that their data work together seamlessly too, and with the other tools that we’ll develop along the way. The use of the browser as a desktop isn’t new, of course: it’s central to Google Chromium and to cloud-friendly Linux variants like Jolicloud. But it hasn’t really been used so much as a development environment, or as the host for a language that lives inside the web’s main document data structure. I’m not hung-up on it being Smalltalk — a direct-manipulation front-end to jQuery UI might be even better — but some form of highly interactive programming-in-the-web might be interesting to try.

Coding is an under-rated skill, even for non-programmers. Computer science undergraduates spend a lot of time learning to program. While one can argue convincingly that computer science is about more than programming, it’s without doubt a central pillar of the craft: one can’t reasonably claim to be a computer scientist without demonstrating a strong ability to work with code. (What this says about the many senior computer science academics who can no longer program effectively is another story.) The reason is that it helps one to think about process, and some of the best illustrations of that comes from teaching. Firstly, why is code important? One can argue that both programming languages and the discipline of code itself are two of the main contributions computer science has made to knowledge. (To this list I would add the fine structuring of big data and the improved understanding of human modes of interaction — the former is about programming, the latter an area in which the programming structures are still very weak.) They’re so important because they force an understanding of a process at its most basic level. When you write computer software you’re effectively explaining a process to a computer in perfect detail. You often get a choice about the level of abstraction you choose. You can exploit the low-level details of the machine using assembler or C, or use the power of the machine to handle the low-level details and write in Haskell, Perl, or some other high-level language. But this doesn’t alter the need to express precisely all that the machine needs to know to complete the task at hand. But that’s not all. Most software is intended to be used by someone other than the programmer, and generally will be written or maintained in part by more than one person — either directly as part of the programming team or indirectly through the use of third-party compilers and libraries. This implies that, as well as explaining a purpose to the computer, the code also has to explain a purpose to other programmers. So code, and programming languages more generally, are about communication — from humans to machines, and to other humans. More importantly, code is the communication of process reduced to its purest form: there is no clearer way to describe the way a process works than to read well-written, properly-abstracted code. I sometimes think (rather tenuously, I admit) this is an unexpected consequence of the halting problem, which essentially says that the simplest (and generally only) way to decide what a program does is to run it. The simplest way to understand a process is to express it as close to executable form as possible.

You think you know when you learn, are more sure when you can write, even more when you can teach, but certain only when you can program. Alan Perlis

There are caveats here, of course, the most important of which is that the code be well-written and properly abstracted: it needs to separate-out the details so that there’s a clear process description that calls into — but is separate from — the details of exactly what each stage of the process does. Code that doesn’t do this, for whatever reason, obfuscates rather than explains. A good programming education will aim to impart this skill of separation of concerns, and moreover will do so in a way that’s independent of the language being used. Once you adopt this perspective, certain things that are otherwise slightly confusing become clear. Why do programmers always find documentation so awful? Because the code is a clearer explanation of what’s going on, because it’s a fundamentally better description of process than natural language. This comes through clearly when marking student assessments and exams. When faced with a question of the form “explain this algorithm”, some students try to explain it in words without reference to code, because they think explanation requires text. As indeed it does, but a better approach is to sketch the algorithm as code or pseudo-code and then explain with reference to that code — because the code is the clearest description it’s possible to have, and any explanation is just clearing up the details. Some of the other consequences of the discipline of programming are slightly more surprising. Every few years some computer science academic will look at the messy, unstructured, ill-defined rules that govern the processes of a university — especially those around module choices and student assessment — and decide that they will be immensely improved by being written in Haskell/Prolog/Perl/whatever. Often they’ll actually go to the trouble of writing some or all of the rules in their code of choice, discover inconsistencies and ambiguities, and proclaim that the rules need to be re-written. It never works out, not least because the typical university administrator has not the slightest clue what’s being proposed or why, but also because the process always highlights grey areas and boundary cases that can’t be codified. This could be seen as a failure, but can also be regarded as a success: coding successfully distinguishes between those parts of an organisation that are structured and those parts that require human judgement, and by doing so makes clear the limits of individual intervention and authority in the processes. The important point is that, by thinking about a non-programming problem within a programming idiom, you clarify and simplify the problem and deepen your understanding of it. So programming has an impact not only on computers, but on everything to which one can bring a description of process; or, put another way, once you can precisely describe processes easily and precisely you’re free to spend more time on the motivations and cultural factors that surround those processes without them dominating your thinking. Programmers think differently to other people, and often in a good way that should be encouraged and explored.

Most programming languages have fixed definitions and hard boundaries. In thinking about building software for domains we don’t understand very well, a case can be made for a more relaxed, evolutionary approach to language design. I’ve been thinking a lot about languages this week, for various reasons: mainly about the recurring theme of what are the right programming structures for systems driven by sensors, whether they’re pervasive systems or sensor networks. In either case, the structures we’ve evolved for dealing with desktop and server systems don’t feel like they’re the right abstractions to effectively take things forward. A favourite example is the if statement: first decide whether a condition is true or false, and execute one piece of code or another depending on which it is. In a sensor-driven system we often can’t make this determination cleanly because of noise and uncertainty — and if we can, it’s often only probably true, and only for a particular period. So are if statements (and while loops and the like) actually appropriate constructs, when we can’t make the decisions definitively? Whatever you think of this example (and plenty of people hate it) there are certainly differences between what we want to do between traditional and highly sensorised systems, and consequently how we program them. The question is, how do we work out what the right structures are? Actually, the question is broader than this. It should be: how do we improve our ability to develop languages that match the needs of particular computational and conceptual domains? Domain-specific languages (DSLs) have a tangled history in computer science, pitched between those who like the idea and those who prefer their programming languages general-purpose and re-usable across a range of domains. There are strong arguments on both sides: general-purpose languages are more productive to learn and are often more mature, but can be less efficient and more cumbersome to apply; DSLs mean learning another language that may not last long and will probably have far weaker support, but can be enormously more productive and well-targeted in use. In some ways, though, the similarities between traditional languages and DSLs are very strong. As a general rule both will have syntax and semantics defined up-front: they won’t be experimental in the sense of allowing experimentation within the language itself. If we don’t know what we’re building, does it make sense to be this definite? There are alternatives. One that I’m quite keen on is the idea of extensible virtual machines, where the primitives of a language are left “soft” to be extended as required. This style has several advantages. Firstly, it encourages experimentation by not forcing a strong division of concepts between the language we write (the target language) and the language this is implemented in (the host language): the two co-evolve. Secondly, it allows extensions to be as efficient as “base” elements, assuming we can reduce the cost of building new elements appropriately low. Thirdly, it allows multiple paradigms and approaches to co-exist within the same system, since they can share some elements while having other that differ. Another related feature is the ability to modify the compiler: that is, don’t fix the syntax or the way in which its handled. So as well as making the low level soft, we also make the high level soft. The advantage here is two-fold. Firstly, we can modify the forms of expression we allow to capture concepts precisely. A good example would be the ability to add concurrency control to a language: the low-level might use semaphores, but programing might demand monitors or some other construct. Modifying the high-level form of the language allows these constructs to be added if required — and ignored if not. This actually leads to the second advantage, that we can avoid features we don’t want to be available, for example not providing general recursion for languages that need to complete all operations in a finite time. This is something that’s surprisingly uncommon in language design despite being common in teaching programming: leaving stuff out can have a major simplifying effect. Some people argue that syntax modification is unnecessary in a language that’s sufficiently expressive, for example Haskell. I don’t agree. The counter-example is actually in Haskell itself, in the form of the do block syntactic sugar for simplifying monadic computations. This had to be in the language to make it in any way usable, which implied a change of definition, and the monad designers couldn’t add it without the involvement of the language “owners”, even though the construct is really just a re-formulation and generalisation of one common in other languages. There are certainly other areas in which such sugaring would be desirable to make the forms of expression simpler and more intuitive. The less well we understand a domain, the more likely this is to happen. Perhaps surprisingly, there are a couple of existing examples of systems that do pretty much what I’m suggesting. Forth is a canonical example (which explains my current work on Attila); Smalltalk is another, where the parser an run-time are almost completely exposed, although abstracted behind several layers of higher-level structure. Both the languages are quite old, have devoted followings, and weakly and dynamically typed — and may have a lot to teach us about how to develop languages for new domains. They share a design philosophy of allowing a language to evolve to meet new applications. In Forth, you don’t so much write applications as extend the language to meet the problem; in Smalltalk you develop a model of co-operating objects that provide direct-manipulation interaction through the GUI. In both cases the whole language, including the definition and control structures, is built in the language itself via bootstrapping and cross-compilation. Both languages are compiled, but in both cases the separation between run-time and compile-time are weak, in the sense that the compiler is by default available interactively. Interestingly this doesn’t stop you building “normal” compiled applications: cross-compile a system without including the compiler itself, a process that can still take advantage of any extensions added into the compiler without cluttering-up the compiled code. You’re unlikely to get strictly the best performance or memory footprint as you might with a mature C compiler, but you do get advantages in terms of expressiveness and experimentation which seem to outweigh these in a domain we don’t understand well. In particular, it means you can evolve the language quickly, easily, and within itself, to explore the design space more effectively and find out whether your wackier ideas are actually worth pursuing further.

Simon Dobson

Call for papers: D3Science

Workshop on D3Science - Call for Papers

(To be held with IEEE e-Science 2011) Monday, 5 December 2011 Stockholm, Sweden

Submission instructions

Important dates

Organizers

PC members

Pervasive healthcare

Mainstreaming Smalltalk

Why we have code

Evolving programming languages