# Questionable Statistics

My four years in the Statistics Department have been characterised by a continual feeling that I don’t know enough about statistics. That said, I can’t help but notice how statistics are used in non-specialist contexts, and of course there are times when eyebrows might reasonably be raised. Andrew Gelman (who gave a very enjoyable departmental seminar on this topic in May) has a notable blog critiquing poor statistical methodology, especially in social science; and the blogosphere is packed with compilations of egregious statistical oxymorons (“All schools *must* be above average” etc).

I’m aiming for more of a middle ground here. I’ve picked three examples of articles I found interesting at some level over the past few months. In each of them, some kind of data presentation or experiment design arose, and in all cases, I think it resulted in more questions rather than more answers, not really in a good way, since in all cases I did actually want to know more about what was going on.

Extreme rounding errors

This Guardian article about inequality in career trajectories for teachers in schools serving more advantaged and more deprived areas raises a lot of interesting questions, and actually proposes answers to several of them. The problem is the double bar chart halfway down.

First a superficial moan. Rounding to the nearest integer is fine, indeed sensible in most contexts, but not when you are treating small numbers. Comparing the extreme bars, we could have 11.5 v 12.5 or we could have 10.5 v 13.5. The headlines would probably read “N% more teachers leave disadvantaged schools”. In the first case N is 9, in the second it’s 29. So it’s not really a superficial moan. Data is supposed to offer a measure of the effect under consideration, and a discussion of how treat this effect requires a veneer of reliability about the size of the effect. We don’t just need to be asking “how can we solve this”; we also need to be asking “is this worth solving?” I’m not saying the answer is no, but based on this graph alone the answer is not ‘definitely yes’. [1]

A question to ask in the context of this report is “why do more teachers leave the profession from schools in deprived areas?” But I think most people could speculate a broadly sensible answer to this. The data raises the more subtle question “why are the rates of leaving the profession so similar across all types of school?” Essentially I think this survey is a negative result – the effect just isn’t as strong as I’d have suspected, nor I imagine the authors nor the Guardian education editorship.

By contrast, even with the amateurish presentation, the rates of moving school clearly are significant and worth talking about based on this survey. Significant data demands further information though. Once an effect is significant, we need to know more details, like to what extent the teachers are moving to schools in the same band or to mostly towards the left end of the graph. Again though, there’s no comment on the magnitude of this effect. Assuming the rate is given per year [2], then among teachers who do not leave the profession, the average tenure of a given job is more than ten years, even in the most deprived category. Maybe this is all very heavy-tailed, and so the average isn’t a good thing to be using. But this doesn’t seem like an unreasonable number to me? If the claim is that this is a major problem for the teaching profession, then we should be told what the numbers are in vaguely comparable industries.

The final thing is pretty bad. Three paragraphs under the graph, it is claimed that the rate of leaving is 70% higher in most- than least-deprived categories. We decided that N was somewhere between 9 and 29. But not 70. I can only assume that this came from the ‘moving’ category instead…

[1] I’m assuming the sample pool was large enough that the effect is statistically significant, because there isn’t a direct link to the source data or even the source report unfortunately.

[2] and is it plausible that this is rate per year? Does the profession really experience ten percent annual turnover? Is it even plausible that the scale is the same? Maybe rate of leaving really is twice rate of moving school, and I have no idea about anything, but this seems questionable.

Many, many categories

The bank Halifax commissions an annual survey about pocket money, and this year’s received widespread coverage. My attention was caught by this article on BBC news. The short summary is that they surveyed about 1,200 children from about 600 families (so in many cases a direct comparison between siblings could have been possible), and it sounds like they restricted to the age range 8-15, and asked how much pocket money they received, and whether they wanted more.

The clickbait summary was that boys receive about 13% more than girls. While it doesn’t give the exact numbers, it also mentions that more boys than girls thought they deserved more money. A psychologist conjectures that this latter effect might explain the former effect, and indeed this seems plausible. The problem is that elsewhere in the article, it says that in the corresponding survey in 2015, boys receive on average only 2% more than girls.

We therefore have to consider a couple of things. 1) Is it plausible that the actual average pocket money rates (among the whole population) have fluctuated in such a gender-dependent way in 12 months? 2) If not, then can we still say something useful, even if our confidence in the methodology of the survey is reduced?

I think the answer to the first question is definitely ‘no’. So we come to the second question. Firstly, how could this have arisen? Well, there’s no indication how the children were chosen, but equally it’s hard to think of a way of choosing them that would increase artificially the boys’ pocket money. In this last sentence, I’m deliberately suggesting the null hypothesis that boys and girls are the same. This is misleading. For now, our null hypothesis is that 2015 should be the same as 2016. Indeed, it’s equally plausible that this year’s summary data was ‘more correct’, and so we are asking why the difference was excessively compressed in 2015.

This is entirely a matter of effect size, and it’s to the article’s credit that so much information is given further down the page that we can actually make the following remarks quantitatively. At the bottom, they show the average pay (across genders) in different regions. There is a lot of variation here. The rate of pay in East Anglia is only 60% what it is in London. At some level, I’m surprised it isn’t even higher. The article tells us that children between the ages of eight and fifteen were surveyed. By what rate did your pocket money change in that interval? Surely it doubled at the very least? Weekly pocket money at age eight is for sweets and an occasional packet of Pokemon cards (substitute 21st century equivalent freely…). Teenagers need to pay for outings and clothes etc, so there’s no reason at all why this should be comparable. For the purposes of definiteness in the next calculation, let’s say that fifteen-year-olds get paid four times as much as eight-year-olds on average, which I think is rather conservative.

So here’s a plausible reason for the fluctuations. Suppose between 2015 and 2016, the survey substituted 10 eight-year-old boys from the Midlands for 10 fifteen-year-old boys from London. The expectation change in the average amongst the 600 boys surveyed is

$\frac{10}{600} \times (\sim 6.5) \times (4 \times (0.6)^{-1}) \approx 0.7,$

in pounds, which is on the order of the actual change from 2015 to 2016. Choosing the participants in a non-uniform way seems like too basic a mistake to make, but mild fluctuations in the distribution of ages and locations, as well as the occasional outlier (they want to use the mean, so one oligarch’s daughter could make a noticeable difference in a sample size of 600) seems more plausible as an explanation to me than a general change in the population rates. Choosing participants in a uniform way is just hard when there are loads and loads of categories.

I’m not really saying that this survey is bad – they have to report what they found, and without a lot more information, I have no idea whether this year’s is more or less plausible than last year’s. But if you add the phrase “especially in 2016” to everything the psychologist says, it suddenly seems a bit ridiculous. So it would be worth making the remark that even if this effect is statistically significant, that doesn’t mean the effect size is large relative to lots of other less controversial effect sizes visible in the data.

Comparing precious metals

I’ve recently returning from this year’s International Mathematical Olympiad, and now we are well into the swing of its sweatier namesake in Rio. At both events, there have been many opportunities to observe how different people with different ambitions and levels of extroversion display different levels of pleasure at the same medal outcomes. In the light of this, I was drawn to this article, which has been getting quite a lot of traction online.

The basic premise is simple. Silver medallists are less happy than bronze medallists (when – unlike at the IMO – there is only one of each medal), and it’s not hard to come up with a cheap narrative: silver-winners are disappointed to have missed out on gold; bronze-winners are glad to have got a medal at all. This is all pretty convincing, so maybe there’s no need actually to collect any data, especially since asking a crowd of undergraduates to rate faces on the 1-10 agony-ecstacy scale sounds like a bit of an effort. Let’s read a couple of sentences, describing a follow-up study on judo medallists at the Athens Olympics:

Altogether, they found that thirteen of the fourteen gold medal winners smiled immediately after they completed their winning match, while eighteen of the twenty-six bronze medalists smiled. However, none of the silver medalists smiled immediately after their match ended.

You might wonder how come there are almost twice as many bronzes as golds. Well, very full details about the judo repechage structure can be found here, but the key information is that all bronze-medallists won their final match, as, naturally, did the gold-medallists. The silver-medallists lost their final match, ie the gold-medal final. So this study is literally reporting that highly-driven competitive sportsmen are happier straight after they win, than straight after they lose. This doesn’t have much to do with medals, and isn’t very exciting, in my opinion. I would, however, be interested to know what was eating the one gold medallist who didn’t smile immediately.

This isn’t really a gripe about the statistics, more about the writing. They are testing for an effect which seems highly plausible, but is branded as surprising. The study for which they give more methodological details in fact seems to be testing for a related effect, which is not just plausible, but essentially self-evident. So naturally I want to know more about the original study, which is the only place left for there to be any interesting effects, for example if they looked at athletics events which don’t have a binary elimination structure. But we aren’t told enough to find these interesting effects, if they exist. How annoying. The only thing left is to think about is my bronze medal at IMO 2008. They said eight years would be enough for the wounds to heal but I’m still not smiling.