Metrics and the Research Evaluation Framework

Is the success of the UK and US in global university research rankings any more than a consequence of the choice of particular metrics? And does this explain the design of the funding and evaluation methodologies? We’re currently at the start of the UK’s periodic Research Evaluation Framework (REF) process, during which every School and Department in every university in the country is evaluated against every other, and against international benchmarks, to rate the quality of our research. As one might imagine, this isn’t the most popular process with the academics involved (especially those like me who are directors of research and so have to manage the return of our School’s submission). It certainly absorbs a lot of intellectual capacity. In principle it shouldn’t: the REF is intended to generate a fair reflection of a School’s behaviour, is supposed to measure those activities that academics working at the cutting edge would do anyway, and to encourage people to adopt best practices by rewarding them in the assessment methodology. In reality of course the stakes are so high that the process can distort behaviour, as people (for example) target their published work at venues that will be good in the REF (to get a better REF result) and not necessarily those that would give the work the best exposure in the appropriate community. In principle this sort of arbitrage is impossible as the REF should value most highly those venues that are “best” — a sort of academic version of the efficient markets hypothesis. In reality there’s often a difference, or at least a perceived difference. In computer science, for example, we often use conferences in preference to journals for really new results because they get the word out faster into the community. Other sciences don’t do this, so we worry that conferences (which in other fields are quite peripheral to the publications process) won’t be judged as seriously even though they might be scientifically more valuable. Managing our engagement with the REF process has made me think more widely about what the REF is trying to achieve, and how it’s achieving it. In particular, are the outcomes the REF encourages actually the ones we should be encouraging? If we look at universities across the world, one thing that stands out is the way that, in science and technology at any rate, the UK comes a close second to the US (and way above all other countries) in terms of publications, citations, patents and other concrete outputs of research — despite the smaller population and even smaller proportional science budget. Despite the perpetual worries about a brain-drain of the best scientists to better-funded US Schools, the top UK institutions continue to do amazingly well. The US and the UK also have strikingly similar funding models for universities. A small number of funding agencies, mostly national, disburse funding to Schools in particular fields according to priorities that are set at least notionally by the researchers (known in the UK as the Haldane Principle) but also and increasingly with regard to nationally-defined “strategic priorities” (usually in defence or security in the US). The agencies are constantly pressured about the value they return for taxpayers’ money — disproportionately pressured, given the relatively small sums involved — and so tend to adopt risk-reduction strategies such as following their money and increasing the funding of individuals and groups who have had previous funding and have not made a mess of it. This has three consequences: it becomes harder for junior staff to make their names on their own; it concentrates funding in a few institutions that are able to exploit their “famous names” to acquire follow-on funding; and, over time, it subtly encourages ambitious and talented people to gravitate to these well-funded institutions. The medium-term result is the creation of larger research-intensive Schools that absorb a relatively large proportion of the available funding (and to a lesser extent talent). One can argue about whether this is a good outcome or not. Proponents would claim that it generates a “critical mass” of researchers who can work together to do better work. Critics would argue that larger Schools can rapidly become too large for comfortable interaction, and that the focus needed to grow can come at the expense of cross-disciplinary research can discourage people from undertaking risky projects, since these might damage (or at least not advance) the School’s collective reputation. Do these two processes — high impact and winner-takes-most funding — run together? Is one causative of the other? It seems to be that they’re actually both a consequence of a more basic decision: the choice of metrics for evaluation. And this in turn tells us where the REF fits in. The UK and US systems have chosen specific, measurable outcomes of research, in terms of a broadly-defined international “impact”. Given this choice, the funding model makes perfect sense: find groups that perform well under these metrics, fund them more (to get more good results) and help them to grow (to create more people who do well under the metrics). The system then feeds-back on itself to improve the results the country gets under it’s chosen framework for evaluation. This all seems so logical that it’s worth pointing out that there’s nothing inherently “right” about this choice of metrics, that there are alternatives, and that the current system is in conflict with other parts of universities’ missions. A noticeable feature of the US and UK system is that they are increasingly two-tier, with research-led institutions and more general teaching institutions doing little or no research — even in fields that don’t actually benefit as much from economies of scale or critical masses of researchers (like mathematics). This means that students at the teaching institutions don’t get exposure to the research leaders, which is one of the main benefits of attending a research-led university. This is bound to have an impact on student satisfaction and achievement — and is explicitly excluded from the REF and similar metrics. It’s interesting to note that, in the university rankings that include student-centred metrics such as those performed in the UK by the Sunday Times and the Guardian, the sets of institutions at the top are shuffled compared to the research-only rankings. (St Andrews, for example, does enormously better in surveys that take account of the student experience over those that focus purely on research. We would argue this is a good thing.) If we accept the choice of metrics as given, then experience seems to suggest that the UK’s evaluation and funding structures are optimised to deliver success against them. This is hardly surprising. However, one could also choose different sets of equally defensible metrics and get radically different results. If one takes the view, for example, that the desirable metric is to expose as many students as possible to a range of world-class researchers, one could set up a system which eschews critical mass and instead distributes researchers evenly around a country’s institutions. This is in fact roughly what happens in France and Italy, and while I can’t say that this is a direct consequence of a deliberate choice of metrics, it’s certainly consistent with one.

There is nothing so useless as doing efficiently that which should not be done at all. Peter Drucker

Like many things, this is actually about economics: not in the narrow sense of money, but in the broad sense of how people respond to incentives. Rather than ask how well we do under REF metrics, we should perhaps ask what behaviours the REF metrics incentivise in academics and institutions, and whether these are the behaviours that are of best overall social benefit to the country. Certainly scientific excellence and impact are of vital importance: but one can also argue that the broadest possible exposure to excellence, and the motivational effect this can have on a wider class of students, is of equal or greater importance to the success and competitiveness of the country. The narrow, research-only metrics may simply be too narrow, and it’d be a mistake to optimise against them if in doing so — and achieving “a good result” according to them — we destroyed something of greater value that simply wasn’t being measured.