Three questions to ask of your KS2 SATs data


Like many of you, I have spent a lot of time over the past few weeks poring over this summer’s exam results for the schools in my trust. The grades themselves are always everyone’s first consideration, but this year has, of course, been particularly challenging because of the return to formal examinations. The steps taken by the exams regulator Ofqual to return the overall standard to broadly midway between 2021 and 2019 levels seems to have worked as expected; students on the whole have the grades we might have anticipated.

As the DfE holds secondary schools to account via a progress measure, Progress 8, the ‘input’ measure becomes as significant as the outputs. The Key Stage 2 test scores which the students obtained at the end of Year 6 are used to map out expected average attainment at a national level; pupil performance above or below this average contributes to a school’s progress score.

In 2016 the DfE changed how Key Stage 2 results were reported. They moved from levels, based on the National Curriculum, to scaled scores ranging from 80 to 120, with a score of 100 indicating that a pupil was at the “expected standard” and should therefore be able to access secondary school study appropriately. This standard is not the same as the average score, which is higher than 100.

Many of us waited patiently for the first cohort with these scores to reach Year 11 to understand any differences from what went before. However, because of Covid, it has taken until 2022 for the first set of scaled scores to be mapped to exam results. This is the first chance to take a careful look at these scores and get some sense as to how they relate to outcomes in GCSE exams – and here are three questions you can ask of your data.

1. Are your ‘average’ scores disguising details? 

The DfE uses pupils’ average scores from reading and mathematics to define their starting point, and this approach will continue using scaled scores. Many schools will have stored these averages in their management systems in an effort to reduce the amount of data teachers use, or simply because they are interested in the Progress 8 calculation. But the process of averaging can hide important differences between pupils. Take these two pupils:

Emma – average scaled score 100

Zaid – average scaled score 100

The Progress 8 calculation regards these pupils as identical. But suppose their separate scores were as follows:

Emma: reading scaled score 95, mathematics scaled score 105

Zaid: reading scaled score 102, mathematics scaled score 98

Here the difference between their reading scores is quite significant; in 2017 around 20% of pupils had a reading score of 90 or below, but nearly 50% had a score up to 102. Research by FFT some years ago shows that the distribution of reading and mathematics scores can lead to a wide range of attainment later on. It would be worth schools looking more closely at their KS2 scores to identify students like this.

2. How can you view low, middle and high scores?

With the introduction of scaled scores, the old definitions of prior attainment, which roughly corresponded to levels 3, 4 and 5, needed to change. The DfE decided to define scores below 100 as ‘low’ and chose 110 as the threshold for ‘high’. These choices lead to a much lower proportion designated as high prior attainment (and a higher proportion as ‘low’), so it means that direct comparisons with earlier years are not valid.

The design of Key Stage 2 tests is different in principle from GCSE. The focus is to ensure that the score of 100 is consistent over time; a panel of teachers and assessment experts scrutinises results every year to ensure, as closely as possible, that this standard is fixed. Unlike GCSE, where the method of comparable outcomes is used to keep grade distributions consistent from one year to the next, there is no theoretical cap on the number of students attaining the range of scaled scores. However, the reality is that a new testing regime followed by familiarity over time, leads to what Ofqual refers to as the "sawtooth" effect, so that the scores in 2017 for similar children are higher than in 2016, and there will be further change in subsequent years. 

It is important to understand that the ‘higher’ score of 110 chosen by the DfE is arbitrary; this is more to do with convenient accountability reporting than it is a genuine identification of higher ability. A student obtaining such a score may well be a higher attainer, but they may also be more consistently able to perform at the expected standard. GCSE papers have questions which are designed to test the whole ability range and you can be more confident that a higher mark indicates genuinely higher attainment.

I’d suggest that schools view pupils with scores of 110 or higher in isolation of any other evidence with some caution; don’t automatically assume that they will perform better during their secondary years.

3. What can you do about unusual or missing results?

Every year, school data managers have to manage situations where data is missing or the results are unusual; for example, when a pupil has missed one of the papers or hasn’t gained enough marks to achieve the lowest scaled score level. It can then be unclear what they should record or how to process the results. 

I suspect that schools’ systems may therefore have a wide range of results being stored in these circumstances. The DfE has a set of complex business rules that it uses to calculate Progress 8 in these situations. If you have students with KS2 results which are out of the ordinary, it would be worth checking that their results were recorded in line with the DfE rules. 

It's a lot more than accountability

All of the points above relate to school-level accountability. With performance tables being published again, school leaders are obviously concerned how their schools will be reflected. As understandable as this is, it’s important to make sure that the Key Stage 2 results that pupils bring with them go beyond mere average data points. Even then, let’s remember that the tests were taken during one week and may well not be the best or most reliable assessments. 

It would be a sad state of affairs indeed if one scaled score is all that lasted of the immensely valuable learning each child had in their primary school. Primary school teachers have much richer information to share during transition and this additional information, such as that from other standardised tests like CAT4, gives secondary schools a much more holistic picture. Comparing a pupil’s score in quantitative reasoning with their KS2 mathematics score, for example, will give more clarity around their attainment and potential. 

If you’re looking for ideas about using KS2 SATs results alongside CAT4, read our case study from St Peter’s Collegiate School in Wolverhampton – Using CAT4 from transition to GCSEs

Duncan Baldwin (1)

You may also be interested in...