For academics we have the Research Excellence Framework (REF) exercise, which aims to measure the research performance of individual departments and whole universities, the National Student Survey (NSS) to measure the student experience and numerous newspaper league tables. In particular, the REF (formerly the Research Assessment Exercise, RAE) is a very influential performance indicator since it directly determines the distribution of one of the major funding sources for UK. It attracts a fair amount of comment as a result. See, for example, these informative posts calling the REF a time-wasting monster (Dorothy Bishop) and defending it (Andrew Derrington). Those articles have focused predominantly on whether the REF is good value for money (since it costs tens of millions of pounds to run).
I see a different problem with the REF, and also with all the other performance measures that are currently in use, which is that we simply have no idea how well the measures quantify what we want to know. All scientists know that in order to declare one treatment 'better' than another, we need not only a measure of its performance, but also an understanding of how good that measure is. None of us, beyond about first year undergraduate, would dream of presenting a mean without an error bar. We would also not give values beyond the number of decimal places that we can measure. We would certainly want to explain in our journal articles the caveats that should be taken into account to prevent people from over-interpreting our data.
Yet the REF, the NSS, and all of the newspaper league tables seem to be exempt from these things that all psychologists would consider good practice. So how do they 'measure up'? Are they reliable and precise? Do they even measure the thing they claim to [ed: 'valid', for the scientists in the readership]?
The authors of the posts above don't seem concerned about whether REF is an accurate measure. Dorothy Bishop's blog asserts that the REF yields no surprises (which would make it pointless but not inaccurate). Really? The top four or five might be obvious, but as you look down the list do you really find all those entries unsurprising? I don't want to name names, but I find it surprising how high some of those institutions appear to be, and similarly how low others are.
If I look at the tables of departments within my own field, rather than at the institutional ranks, I am very surprised indeed. I don't have any evidence to show that they are inaccurate or noisy other than my own experience (to quantify that we would need to conduct the exercise twice in quick succession with a new panel, which never occurs). But I certainly have my doubts about whether a REF rank of 10 is actually different to a REF rank of 15, say, or even 20. I mean 'significantly different' to use the same bar that we set for our scientific reporting. The question turns out to be incredibly important given the financial impact. In the newspaper league tables, which are created on much smaller budgets, my own department's ranking year-on-year changes enormously without any dramatic changes to the department or course.
Ok, so these measures might be somewhat noisy, but do they at least measure the thing we want? That isn't clear either. In the case of the REF we have no measure with which to compare other than our own personal judgements. And if the two disagree I suppose we have to decide that "the REF was based on more data than my opinion" so it wins. In fact, possibly the fact that Dorothy finds the tables unsurprising is that she has more experience (she certainly does). Without a gold standard, or any other metric at all for that matter, how do we decide whether REF is measuring the 'right' thing? We don't. We just hope. And when you hear that some universities are drafting in specialists to craft the documents for them, while other departments leave the job to whoever is their Director of Research, I suspect what we might be measuring is not how good an institution is at research, but how much they pay for their public relations people. How well they convince people or their worth and, possibly, how willing they are to 'accentuate' their assets.
Andrew Derrington (aka Russell Dean) points out that REF (and RAE) "seem to have earned a high degree of trust." I'm not sure who trusts it so much (possibly just the senior academic managers?) but even if it is trusted we know that doesn't mean it's good, right? It could well be the simple problem that people put far too much faith in a number when given one. Even highly competent scientists.
I think it's far from clear that the REF is measuring what we want, and even less that it's doing so accurately. But I should add that I don't have a better idea. I'm not saying we should throw the baby out with the bathwater. Maybe I'm just saying be careful about trying to interpret what a baby says.
Hmm, possibly I over-extended with the baby metaphor.