Tuesday 29 May 2012

The problem with performance measures (e.g. REF and NSS)

Increasingly, in all walks of life and all fields of work, performance now has to be quantified; we must all be given a number that identifies how good we are at whatever we do.

For academics we have the Research Excellence Framework (REF) exercise, which aims to measure the research performance of individual departments and whole universities, the National Student Survey (NSS) to measure the student experience and numerous newspaper league tables. In particular, the REF (formerly the Research Assessment Exercise, RAE) is a very influential performance indicator since it directly determines the distribution of one of the major funding sources for UK. It attracts a fair amount of comment as a result. See, for example, these informative posts calling the REF a time-wasting monster (Dorothy Bishop) and defending it (Andrew Derrington). Those articles have focused predominantly on whether the REF is good value for money (since it costs tens of millions of pounds to run).

I see a different problem with the REF, and also with all the other performance measures that are currently in use, which is that we simply have no idea how well the measures quantify what we want to know. All scientists know that in order to declare one treatment 'better' than another, we need not only a measure of its performance, but also an understanding of how good that measure is. None of us, beyond about first year undergraduate, would dream of presenting a mean without an error bar. We would also not give values beyond the number of decimal places that we can measure. We would certainly want to explain in our journal articles the caveats that should be taken into account to prevent people from over-interpreting our data.

Yet the REF, the NSS, and all of the newspaper league tables seem to be exempt from these things that all psychologists would consider good practice. So how do they 'measure up'? Are they reliable and precise? Do they even measure the thing they claim to [ed: 'valid', for the scientists in the readership]?

The authors of the posts above don't seem concerned about whether REF is an accurate measure. Dorothy Bishop's blog asserts that the REF yields no surprises (which would make it pointless but not inaccurate). Really? The top four or five might be obvious, but as you look down the list do you really find all those entries unsurprising? I don't want to name names, but I find it surprising how high some of those institutions appear to be, and similarly how low others are. 

If I look at the tables of departments within my own field, rather than at the institutional ranks, I am very surprised indeed. I don't have any evidence to show that they are inaccurate or noisy other than my own experience (to quantify that we would need to conduct the exercise twice in quick succession with a new panel, which never occurs). But I certainly have my doubts about whether a REF rank of 10 is actually different to a REF rank of 15, say, or even 20. I mean 'significantly different' to use the same bar that we set for our scientific reporting. The question turns out to be incredibly important given the financial impact. In the newspaper league tables, which are created on much smaller budgets, my own department's ranking year-on-year changes enormously without any dramatic changes to the department or course.

Ok, so these measures might be somewhat noisy, but do they at least measure the thing we want? That isn't clear either. In the case of the REF we have no measure with which to compare other than our own personal judgements. And if the two disagree I suppose we have to decide that "the REF was based on more data than my opinion" so it wins. In fact, possibly the fact that Dorothy finds the tables unsurprising is that she has more experience (she certainly does). Without a gold standard, or any other metric at all for that matter, how do we decide whether REF is measuring the 'right' thing? We don't. We just hope. And when you hear that some universities are drafting in specialists to craft the documents for them, while other departments leave the job to whoever is their Director of Research, I suspect what we might be measuring is not how good an institution is at research, but how much they pay for their public relations people. How well they convince people or their worth and, possibly, how willing they are to 'accentuate' their assets.

Andrew Derrington (aka Russell Dean) points out that REF (and RAE) "seem to have earned a high degree of trust." I'm not sure who trusts it so much (possibly just the senior academic managers?) but even if it is trusted we know that doesn't mean it's good, right? It could well be the simple problem that people put far too much faith in a number when given one. Even highly competent scientists.

I think it's far from clear that the REF is measuring what we want, and even less that it's doing so accurately. But I should add that I don't have a better idea. I'm not saying we should throw the baby out with the bathwater. Maybe I'm just saying be careful about trying to interpret what a baby says.

Hmm, possibly I over-extended with the baby metaphor.

Saturday 19 May 2012

Python versus Matlab for neuroscience/psychology

A lot of people ask me why I use Python instead of Matlab, or which is easier/better to learn. Maybe it's time I provided a comparison for psychology/neuroscience types to decide which language is better for them. Note that, although I write a prominent python package, this article is not aimed at trying to convert you. If Matlab works for you and makes you happy that's great! Personally, when I switched to Python I never looked back, and this explains a little about why.

Overall

Overall Python is a more flexible language and easier to read, and for me those two things are really important. Many people don't care whether their code is readable or clear for the future. They want it just to work now. For me, being able to understand the code again in a year's time is really important, and learning a new language wasn't too hard.

A lot of the differences between Matlab and Python come down to two things: 
  1. Matlab has a commercial, proprietary development model whereas Python is open-source. I won't go into that aspect much in this post. Some time I'll write a separate post about why I personally prefer the open-source model (they each have their benefits).
  2. Matlab was designed to do maths but can be used more generally. Python was designed to be general but can be used for maths. That alters the way the languages work and the nature of the other users. That's also part of the reason that Python ships as part of Mac OS X and most Linux distributions. It's so generally useful it's made a part of the operating system.

Price and support

Price was certainly a part of my original decision to switch to Python. I was sick of setting up licenses, or getting blocked because the license server had too many users. Or needing to distribute processing to other machines, and discovering that they didn't all have the necessary (paid-for) toolboxes. But if it were just about the licensing I would have switched to Octave, a free alternative with almost identical syntax. The bigger issue I had was that I didn't actually like the language that much. Too many of the things that I felt should be core components of a language were bolt-on afterthoughts in Matlab.

Also at that point in time (2002-3) Mathworks was unsure if it would continue to support Apple Mac. I wanted to be able to choose what platform I used and not have that determined for me by Mathworks.

Generality

Ultimately there's little that Python can do that Matlab can't, and the converse is even more true. So why should it matter what they were originally designed for? Well, it does alter the decisions made by the programmers that built the systems. Matlab was designed to do maths and was extended to do much more; it was designed to be used by regular scientists not by programmers. Python was designed as a general language, that cold also do maths. On the whole that means that Matlab scripts/packages work very well for moderately complex tasks but they don't scale up very easily. Python might take a little more effort to get going, but saves you headaches in the long run.

Concrete examples? OK, just a couple.

  1. How often in Matlab have you had some error message that made no sense that turned out to be caused by two functions on your path having the same name, or because you'd assigned a variable to the name of a function, and now that function doesn't work? Matlab assumes that everything on your path should be available at all times, because the developers didn't expect people to have hundreds of thousands of different functions on their path. Fair enough; if you only have 500 functions then giving each one a unique name is reasonable. Python is designed to have much larger numbers of libraries and functions installed, and the idea that each should need a unique name is quickly unworkable. So it becomes important that the entire path isn't constantly available in the 'namespace'. So in Python, like most other programming languages, you need to manually import the libraries that you want to use. That means a couple of extra lines at the start of your script but it also means you stand a better chance of avoiding name conflicts despite having a huge number of available functions in your libraries.
  2. Python was designed from the ground-up to support object-oriented programming, with inheritance and dynamic updating of classes. For someone with experience in programming those things are incredibly useful allowing greater re-use of code and fewer bugs in large programs. For doing maths, object oriented programming seems less important and so the concept was rather late to appear in Matlab and the fact that it was bolted on as an afterthought shows.

Powerful syntax

I don't think there's any question that Python's syntax is superior to Matlab's. Some aspects might take you some getting used to (e.g. the fact that indices start at zero, or that correct indentation is a requirement). But in the end it has a huge number of features. Here a just a couple to give you the idea.

Fantastic string handling. Imagine being able to do things like this in Matlab:
>>> a='hello'
>>> b=' world'
>>> a+b #combine two strings? just add them!
'hello world'
>>> (a+b).title() #title is a method of all string objects
'Hello World'
>>> a==b #why would you want to write strcmp?!
False
>>> a>b
True
>>> str1="Strings can be surrounded by single or double quotes"
>>> str2='"Wow" and I can include the other type in the string?!'
(For other string-handling possibilities see the python tutorial).

How about the fact that arguments to functions can be called by name rather than by location in the argument list? So if you only want the 1st and 8th argument just use their names and the other args will take the default values. Sweet! To see this in action see http://docs.python.org/tutorial/controlflow.html#keyword-arguments

Many things are easy in Python and considerably less readable in Matlab. Maybe they aren't important to you, but when you have very large scripts they can become a huge time-saver.

Available libraries

Although in science there are lots of Matlab users, which is great for sharing a script. What many people don't realise is that, overall, Python has many more users. So when you need help with, say, sound handling or importing some new file format, you are much more likely to find a ready-made library available for Python. That was another reason for me originally switching to Python; in early 2003 it already had a fully functional wrapper for OpenGL so I could use hardware-accelerated graphics directly from my scripts.

When I decided to build an editor and experiment builder GUI for PsychoPy I could do it all within Python, with relatively little effort, from existing Python libraries (e.g. wxPython). I can't imagine doing all that Matlab (although much of it would be technically possible it would be extremely painful).

When Microsoft changed the format of Excel files, soon enough there was a Python library (openpyxl) to read and write them, because an enthusiast went and created it. On Matlab, you still can't do that with a Mac, because Mathworks hasn't yet made that added it.

Ultimately

It is because of Python that I was able to write PsychoPy, and it's why other programmers have jumped on board the project. The clean easy syntax and the huge huge array of libraries allow normal people to write pretty professional applications.

Thursday 8 March 2012

Are you a Gisbie?

Do know those people that always start a conversation by telling you how busy they are? I don't mean occasionally. I mean the people that always seem to think they're impossibly busy.

"God I'm So Busy" people. Gisbies.

Think of some of the gisbies you know. What do you think of them? I've come to realise that I don't respect them. I tend to see them as less competent than my other non-gisbie colleagues. What I'm not sure about is why they annoy me so.

I think it's probably simply that I've noticed a correlation. The people in my life I've been most impressed by, those that excel in some way or another, never seem to complain about their time. But surely they are actually the most busy people I know; they're usually running departments or institutes or accomplishing ridiculous things in their spare time.

Gisbies, on the other hand, don't seem to be the people that are getting places and running the show. I actually suspect that the reason they're always telling me how busy they are is because they aren't so productive and they're worried people will think they aren't working. So they want to point out to everyone just how much they do.

And then they start implying that they do more than me. Are they really sure about that? I've got quite a few things going on that take up a lot of time. It's just that I'm not going around complaining about it. OK, sometimes I do, but when my plate is unusually full. Not always.

And then, to make matters worse, it seems like the gisbie actually delays my work by standing in my office telling me how busy they are!

In reality, I actually don't know what makes people into gisbies or why I dislike them. Probably a mixture of factors. But try not to tell me that you're too busy. I mean, not every time we meet.

Thursday 16 February 2012

An online repository for sharing experiments?

Have you ever read a psychology/neuroscience journal article and wondered if the information the authors had given you in the methods section was really sufficient for you to replicate the study?

Have you ever wanted to start a study with a new piece of software or something outside your normal method, and wished there was some existing experiment code that you could adapt for your needs?

A couple of people on the PsychoPy users list have suggested that it would be good to have a place to upload experimental code and materials to share.

It would serve a few purposes:
  • makes a study genuinely replicable, because you would be able to fetch the actual experiment as the authors used. 
  • publicises an experiment that you've run because people could browse the repository looking for experiments they found interesting
  • provides a starting point for new users of a piece of software to build an experiment
The first goal can actually also be met by uploading your experiment to your own lab web pages, but that solution doesn't address the second and third points.

The repository would be agnostic to the subject of the study, and to the software used to run it. You would upload all the materials needed to run it (code, image files etc), tag which software package it was written for (PsychoPy, E-Prime, Presentation, Psychtoolbox etc...), provide a summary of what results should be expected and a reference to the paper showing the original (if published). Then you provide keywords about the topic that the experiment addresses so that people can browse or search for the experiment. Users might search by topic, keyword or software package to find experiments to learn from or replicate.

Potential issues

A few people have raised concerns about the idea:

  • Will it lead people to run studies that they didn't actually understand? For example, see this post on eagle-eyed-autism describing a study going badly wrong because the authors had borrowed code and hadn't really understood it. Is the answer to make sure it's very difficult to run studies, so that the scientist has to really know what you're doing in order to manage? That seems more than a little arrogant.
  • Will errors in studies propagate more? If a study has an error, when another lab writes it from scratch the error will likely not be made, but if they borrow and tweak the bug could propagate. I think the benefit that more eyes potentially examine the experiment and reduce the propagation of bugs.
  • Why should someone else simply take the experiment that I spent hours writing? To me this one just seems blatantly at odds with the aims and philosophy of science. But I guess some people will feel territorial like that.
  • People would never use such a site (unless forced) because they will be too embarrassed by the quality of their code, which was, after all designed to work without necessarily being elegant. I'm fairly sympathetic to this (although I've obviously shared many thousands of lines of my own code). But some people will be brave enough to expose their work fully, especially if it was generated by something like E-Prime or PsychoPy Builder, where the need actually to write code is reduced.

The idea is definitely growing on me, although I don't currently have the time to build the site, nor the funding to pay someone to build it.

I'm keen to hear more views. So feel free to comment below. Hopefully the idea will also be discussed as part of a satellite event on open-science at the Vision Sciences Society conference this May.