Tuesday 29 May 2012

The problem with performance measures (e.g. REF and NSS)

Increasingly, in all walks of life and all fields of work, performance now has to be quantified; we must all be given a number that identifies how good we are at whatever we do.

For academics we have the Research Excellence Framework (REF) exercise, which aims to measure the research performance of individual departments and whole universities, the National Student Survey (NSS) to measure the student experience and numerous newspaper league tables. In particular, the REF (formerly the Research Assessment Exercise, RAE) is a very influential performance indicator since it directly determines the distribution of one of the major funding sources for UK. It attracts a fair amount of comment as a result. See, for example, these informative posts calling the REF a time-wasting monster (Dorothy Bishop) and defending it (Andrew Derrington). Those articles have focused predominantly on whether the REF is good value for money (since it costs tens of millions of pounds to run).

I see a different problem with the REF, and also with all the other performance measures that are currently in use, which is that we simply have no idea how well the measures quantify what we want to know. All scientists know that in order to declare one treatment 'better' than another, we need not only a measure of its performance, but also an understanding of how good that measure is. None of us, beyond about first year undergraduate, would dream of presenting a mean without an error bar. We would also not give values beyond the number of decimal places that we can measure. We would certainly want to explain in our journal articles the caveats that should be taken into account to prevent people from over-interpreting our data.

Yet the REF, the NSS, and all of the newspaper league tables seem to be exempt from these things that all psychologists would consider good practice. So how do they 'measure up'? Are they reliable and precise? Do they even measure the thing they claim to [ed: 'valid', for the scientists in the readership]?

The authors of the posts above don't seem concerned about whether REF is an accurate measure. Dorothy Bishop's blog asserts that the REF yields no surprises (which would make it pointless but not inaccurate). Really? The top four or five might be obvious, but as you look down the list do you really find all those entries unsurprising? I don't want to name names, but I find it surprising how high some of those institutions appear to be, and similarly how low others are. 

If I look at the tables of departments within my own field, rather than at the institutional ranks, I am very surprised indeed. I don't have any evidence to show that they are inaccurate or noisy other than my own experience (to quantify that we would need to conduct the exercise twice in quick succession with a new panel, which never occurs). But I certainly have my doubts about whether a REF rank of 10 is actually different to a REF rank of 15, say, or even 20. I mean 'significantly different' to use the same bar that we set for our scientific reporting. The question turns out to be incredibly important given the financial impact. In the newspaper league tables, which are created on much smaller budgets, my own department's ranking year-on-year changes enormously without any dramatic changes to the department or course.

Ok, so these measures might be somewhat noisy, but do they at least measure the thing we want? That isn't clear either. In the case of the REF we have no measure with which to compare other than our own personal judgements. And if the two disagree I suppose we have to decide that "the REF was based on more data than my opinion" so it wins. In fact, possibly the fact that Dorothy finds the tables unsurprising is that she has more experience (she certainly does). Without a gold standard, or any other metric at all for that matter, how do we decide whether REF is measuring the 'right' thing? We don't. We just hope. And when you hear that some universities are drafting in specialists to craft the documents for them, while other departments leave the job to whoever is their Director of Research, I suspect what we might be measuring is not how good an institution is at research, but how much they pay for their public relations people. How well they convince people or their worth and, possibly, how willing they are to 'accentuate' their assets.

Andrew Derrington (aka Russell Dean) points out that REF (and RAE) "seem to have earned a high degree of trust." I'm not sure who trusts it so much (possibly just the senior academic managers?) but even if it is trusted we know that doesn't mean it's good, right? It could well be the simple problem that people put far too much faith in a number when given one. Even highly competent scientists.

I think it's far from clear that the REF is measuring what we want, and even less that it's doing so accurately. But I should add that I don't have a better idea. I'm not saying we should throw the baby out with the bathwater. Maybe I'm just saying be careful about trying to interpret what a baby says.

Hmm, possibly I over-extended with the baby metaphor.

Saturday 19 May 2012

Python versus Matlab for neuroscience/psychology

A lot of people ask me why I use Python instead of Matlab, or which is easier/better to learn. Maybe it's time I provided a comparison for psychology/neuroscience types to decide which language is better for them. Note that, although I write a prominent python package, this article is not aimed at trying to convert you. If Matlab works for you and makes you happy that's great! Personally, when I switched to Python I never looked back, and this explains a little about why.

Overall

Overall Python is a more flexible language and easier to read, and for me those two things are really important. Many people don't care whether their code is readable or clear for the future. They want it just to work now. For me, being able to understand the code again in a year's time is really important, and learning a new language wasn't too hard.

A lot of the differences between Matlab and Python come down to two things: 
  1. Matlab has a commercial, proprietary development model whereas Python is open-source. I won't go into that aspect much in this post. Some time I'll write a separate post about why I personally prefer the open-source model (they each have their benefits).
  2. Matlab was designed to do maths but can be used more generally. Python was designed to be general but can be used for maths. That alters the way the languages work and the nature of the other users. That's also part of the reason that Python ships as part of Mac OS X and most Linux distributions. It's so generally useful it's made a part of the operating system.

Price and support

Price was certainly a part of my original decision to switch to Python. I was sick of setting up licenses, or getting blocked because the license server had too many users. Or needing to distribute processing to other machines, and discovering that they didn't all have the necessary (paid-for) toolboxes. But if it were just about the licensing I would have switched to Octave, a free alternative with almost identical syntax. The bigger issue I had was that I didn't actually like the language that much. Too many of the things that I felt should be core components of a language were bolt-on afterthoughts in Matlab.

Also at that point in time (2002-3) Mathworks was unsure if it would continue to support Apple Mac. I wanted to be able to choose what platform I used and not have that determined for me by Mathworks.

Generality

Ultimately there's little that Python can do that Matlab can't, and the converse is even more true. So why should it matter what they were originally designed for? Well, it does alter the decisions made by the programmers that built the systems. Matlab was designed to do maths and was extended to do much more; it was designed to be used by regular scientists not by programmers. Python was designed as a general language, that cold also do maths. On the whole that means that Matlab scripts/packages work very well for moderately complex tasks but they don't scale up very easily. Python might take a little more effort to get going, but saves you headaches in the long run.

Concrete examples? OK, just a couple.

  1. How often in Matlab have you had some error message that made no sense that turned out to be caused by two functions on your path having the same name, or because you'd assigned a variable to the name of a function, and now that function doesn't work? Matlab assumes that everything on your path should be available at all times, because the developers didn't expect people to have hundreds of thousands of different functions on their path. Fair enough; if you only have 500 functions then giving each one a unique name is reasonable. Python is designed to have much larger numbers of libraries and functions installed, and the idea that each should need a unique name is quickly unworkable. So it becomes important that the entire path isn't constantly available in the 'namespace'. So in Python, like most other programming languages, you need to manually import the libraries that you want to use. That means a couple of extra lines at the start of your script but it also means you stand a better chance of avoiding name conflicts despite having a huge number of available functions in your libraries.
  2. Python was designed from the ground-up to support object-oriented programming, with inheritance and dynamic updating of classes. For someone with experience in programming those things are incredibly useful allowing greater re-use of code and fewer bugs in large programs. For doing maths, object oriented programming seems less important and so the concept was rather late to appear in Matlab and the fact that it was bolted on as an afterthought shows.

Powerful syntax

I don't think there's any question that Python's syntax is superior to Matlab's. Some aspects might take you some getting used to (e.g. the fact that indices start at zero, or that correct indentation is a requirement). But in the end it has a huge number of features. Here a just a couple to give you the idea.

Fantastic string handling. Imagine being able to do things like this in Matlab:
>>> a='hello'
>>> b=' world'
>>> a+b #combine two strings? just add them!
'hello world'
>>> (a+b).title() #title is a method of all string objects
'Hello World'
>>> a==b #why would you want to write strcmp?!
False
>>> a>b
True
>>> str1="Strings can be surrounded by single or double quotes"
>>> str2='"Wow" and I can include the other type in the string?!'
(For other string-handling possibilities see the python tutorial).

How about the fact that arguments to functions can be called by name rather than by location in the argument list? So if you only want the 1st and 8th argument just use their names and the other args will take the default values. Sweet! To see this in action see http://docs.python.org/tutorial/controlflow.html#keyword-arguments

Many things are easy in Python and considerably less readable in Matlab. Maybe they aren't important to you, but when you have very large scripts they can become a huge time-saver.

Available libraries

Although in science there are lots of Matlab users, which is great for sharing a script. What many people don't realise is that, overall, Python has many more users. So when you need help with, say, sound handling or importing some new file format, you are much more likely to find a ready-made library available for Python. That was another reason for me originally switching to Python; in early 2003 it already had a fully functional wrapper for OpenGL so I could use hardware-accelerated graphics directly from my scripts.

When I decided to build an editor and experiment builder GUI for PsychoPy I could do it all within Python, with relatively little effort, from existing Python libraries (e.g. wxPython). I can't imagine doing all that Matlab (although much of it would be technically possible it would be extremely painful).

When Microsoft changed the format of Excel files, soon enough there was a Python library (openpyxl) to read and write them, because an enthusiast went and created it. On Matlab, you still can't do that with a Mac, because Mathworks hasn't yet made that added it.

Ultimately

It is because of Python that I was able to write PsychoPy, and it's why other programmers have jumped on board the project. The clean easy syntax and the huge huge array of libraries allow normal people to write pretty professional applications.