Comparative Judgement and Quality

The school at which I work has around fifteen regular “feeder schools” – local primary schools whose children attend our secondary school. As part of the entry process new students (along with their parents) have an interview with the Head of School to discuss their experiences of schooling, what they are hoping to achieve in their new school and to answer any questions or alleviate any anxieties that students and families may have. Usually the Head of School will make reference to the latest student report from primary school in order to discuss how things are going and what the student’s goals are for the coming year.  

What can confound the discussion are the differences in the notions of quality of achievement between different feeder schools. As part of the funding arrangements in Australia schools are required to rate student achievement relative to the standard expected of students at that year level. The students are assessed to be “At standard” (in a coloured band for a particular year level) or above/below standards linked to particular year levels. These reports tend to look like this, with student achievement represented by a dot and the expected level indicated by the vertical coloured band:


For most of the schools these reports are reasonably accurate – the judgement agrees with the results from the NAPLAN national assessment and also our teachers’ assessments when those students enter our school. However, there are some feeder schools that are systematically biased in their judgement of student progress. This bias (most often overestimating the achievement of students) has to be factored in when interpreting the reports from those schools.

Although there are some inducements for schools to report that students are more proficient compared to their peers than they actually are, I believe that in these cases it is more of a difference in their assessment of the relative quality of student work. I use quality to describe what students can do and its relationship to what regular students from around the state can do at the same stage of schooling. A work of good local quality would be commensurate with that produced by regular students from a particular school, while good global quality is commensurate with average students from around the state.

In essence the question that the reports of biased schools are answering is “How is this student achieving relative to other Year 6 students that have attended this school in the past?” rather than “How is this student achieving relative to other Year 6 students in the state?”

All teachers use a local sense of quality to judge student work – what is important is how close that local sense of quality matches what quality looks like across the national sample (a national sense of quality). When your idea of relative quality is too strongly linked to the pool of students that are (and have) attended your school it is easy to lose sense of what quality looks like in the entire population, particularly if your local students are stronger or weaker than the average. Having a good sense of quality is important – as a teacher it is difficult to know what to teach and what feedback to give students if you don’t have an accurate sense of what a quality piece of work would look like. To combat this you need a means of recalibrating what quality of achievement looks like compared to a larger sample of students.

Similar difficulties in judging quality arise in the final examinations for our secondary students, particularly in subjects where the exams require more complex performances (like essays and text responses) and are exacerbated when there is only one teacher for that subject. In these cases, when there are fewer teachers and a smaller sample size to calibrate the sense of quality it is easy to lose track of what global quality looks like. One way to combat this is to have teachers work as examiners for the exam board. Being part of the conversations with other examiners about what quality looks like and then seeing hundreds of scripts from students all over the state can help refine that sense of quality. However, this is available only for particular courses and have a limited number of places.

A possible alternative to this process would be to use Comparative Judgement. I believe that comparative judgement has a strong role to play in helping schools refine their concepts of quality. Chris Wheadon and his team at NoMoreMarking have recently embarked on a trial involving marking pieces from 220 schools across the UK, with teachers from each of the schools being involved in making judgements about student scripts from other schools.  

One of the important outcomes of such a process is the opportunity to see and judge hundreds of samples of work from other students at other schools. Making judgements about quality and seeing the range of quality in work from across such a large sample would help a teacher get a better sense of what quality would look like in a national sense and so would help calibrate the local sense of quality. The advantage of a process like comparative judgement is that large samples can be stored electronically and the training can be quite short as it isn’t dependent on knowing the ins and outs of a rubric. I am looking forward to the reports from the trial and how it larger cohort judging be expanded in the future.


