The debate has been hotting up this week in the discussion of the Scottish independence referendum, and with it has come the inevitable barrage of questionable arguments, questionable statistics and questionable logic. Nonetheless, at least lots of people seem to be questioning all of the above fairly carefully. Instead I want to question some of the logic and philosophy of a BBC report, which discusses a change to the grade divisions for GCSEs in some subjects, to be introduced from 2017. The article can be found here.
First a caveat. Picking logical holes in arguments for educational reform is like taking candy from a baby in the sense that it’s normally quite easy, and not usually entertaining or instructive for either party. This article is at least a long way short of the infamous comment from someone at the Institute of Physics that “nearly half of [schools] are actually doing worse than average” (source). In that case, a discussion of gender inequality ends up more like a discussion of Markov’s inequality.
The exact nature of the grade reparameterisation is not hugely important. Instead of indexing grade classes by the set , the set is to be , with 9 the top grade. In the sequel, I’m going to talk about A-levels, because I know very slightly more about that, but everything relevant applies equally well to GCSEs. I want to discuss the request by the general secretary of the Association of School and College Leaders that “students must not be disadvantaged by the change in grading.” My claim is that this is not possible under any change in grading.
We need to consider what the value of a grade is. If different grades make no difference to a student, then by definition that student can’t be disadvantaged by a change in grading procedure. The grade gives an approximate measure of the student’s ability in the subject in question. There are several reasons why it is approximate.
Firstly, marks in exams are not a perfect measure of ability, if there even is an absolute underlying idea of ‘ability’. Some people will under-perform in a given exam, and some people will over-perform. Note if someone feels they always under-perform, a law of large numbers suggests that in fact it is their prediction skills that are below average. This is a fundamentally different question to students who don’t prosper in exam conditions (or prosper excessively!). This is a problem, but not a problem that anyone expects to be solvable by grade reparameterisation.
Secondly, a grade is an approximate measure because it represents a range of possible marks. If you believe you are the best student at a particular subject in the country, you are probably misguided, but in any case, you cannot demonstrate this quality in your GCSE grade. The best you can hope for is to get the top grade, and thus be considered first equal with tens of thousands of others. In conclusion the importance of your grade is entirely a function of how it is interpreted.
Anyway, suppose we are organising a shift in a grade boundary. To make life easy, we only adjust the boundary between the top two grades, which are called Red and Blue in accordance with available board markers. We are describing exam score as a number between 0 and 1, rather than an integer between 0 and 72 or whatever. We focus on students achieving the top Red grade. Who has been disadvantaged in the change of grading? Well naturally the portion of students who are now scoring Blue, whereas previously they were Red. Admittedly they were on the border before, and they are still on the border, just on the side with everyone who scored lower this time. So it looks like no-one has gained from this.
But this is patently false. We’ve said that value of the Red grade is a function of how it is interpreted. To put this into a more realistic context, imagine an employer or, whisper it softly, a university admissions officer, looking at a CV announcing a Red grade in maths. The employer has to make a judgement on the student’s true maths ability (whatever that means) based on this Red grade. What factors are relevant?
- How hard is the exam? Scoring top marks on GCSE maths and scoring top marks on the International Maths Olympiad are both excellent achievements, but in some contexts, the latter would stand out more. The value of the grade is an increasing function of the difficulty of the exam. (I’m assuming there is no rescaling in this toy model, but the logic is preserved under rescaling.)
- How wide is the grade interval? If any mark gave you a Red grade, it wouldn’t signify anything? If Red indicates 70%-100%, you’re have to assume that candidate will on average be somewhere in the middle. This is not uncharitable, merely realistic given an absence of more precise information. The result of this is that the value of a Red grade is a decreasing function of the width of the grade interval.
Thus, so long as the difficulty of the exam remains constant (which is up for discussion, but not here), all the students who get a Red grade under the new regime have gained an advantage from the change. In conclusion, this is a zero-sum game. Some students will benefit, others will be at a disadvantage, and the claim is true, at least within the framework of this reasonable model.
I have interpreted literally a statement that was probably not intended to be interpreted literally. But this is not about point-scoring pedantry. It’s more that this sort of vagueness on matters that could be precise distracts from more useful statements. The person I quoted finishes by saying:
“What is important is that Ofqual sets out very clearly to teachers and students what is needed to achieve a specific grade. This is not the same as simply describing what statistical proportion of pupils will achieve a grade. Employers need a clear message that if a student has achieved a particular grade, it means that they have a certain skill or knowledge level.”
Let’s see how this works in practice at the moment by looking up the OCR Mathematics A-level specification. Examine pages 31 and 32. This describes verbally the standard expected of students to achieve grades A, C and E. I feel I am not being uncharitable if I say that the algorithm to get from grade A to the others is to replace the words “almost all” and “high” with “most” and “reasonable” and drop in the adjectives “usually” or “sometimes” fairly freely.
This illustrates the underlying problem. This phenomenon is more probably apparent in maths than in subjects where quality of writing is a factor. Ultimately the difference between an A and an E is that the students were taught the same material, but one found at least some parts of it more challenging, or at least was slower and less accurate at answering questions about it in the exams. The notion that a grade might give a clear qualitative indication of ability at a more sophisticated level than the above, is both very challenging, and almost completely independent of grade boundaries. If this is genuinely what is desired, it makes sense to focus on this rather than fiddling with boundaries, or complaining about other people fiddling with boundaries.