Of That

Brandt Redd on Education, Technology, Energy, and Trust

05 October 2018

Quality Assessment Part 6: Achievement Levels and Standard Setting

This is part 6 of a 10-part series on building high-quality assessments.

Two mountains, one with a flag on top.

If you have a child in U.S. public school, chances are that they took a state achievement test this past spring and sometime this summer you received a report on how they performed on that test. That report probably looks something like this sample of a California Student Score Report. It shows that "Matthew" achieved a score of 2503 in English Language Arts/Literacy and 2530 in Mathematics. Both scores are described as "Standard Met (Level 3)". Notably, in prior years Matthew was in the "Standard Nearly Met" category so his performance has improved.

The California School Dashboard offers reports of school performance according to multiple factors. For example, the Detailed Report for Castle View Elementary includes a graph of "Assessment Performance Results: Distance from Level 3".

Line graph showing performance of Lake Matthews Elementary on the English and Math tests for 2015, 2016, and 2017. In all three years, they score between 14 and 21 points above proficiency in math and between 22 and 40 points above proficiency in English.

To prepare this graph, they take the average difference between students' scale scores and the Level 3 standard for proficiency in the grade in which they were tested. For each grade and subject, California and Smarter Balanced use four achievement levels, each assigned to a range of scores. Here are the achievement levels for 5th grade Math (see this page for all ranges).

Level 1Less than 2455Standard Not Met
Level 22455 to 2527Standard Nearly Met
Level 32528 to 2578Standard Met
Level 4Greater than 2578Standard Exceeded

So, for Matthew and his fellow 5th graders, the Math standard for proficiency, or "Level 3" score, is 2528. Students at Lake Matthews Elementary, on average, exceeded the Math standard by 14.4 points on the 2017 tests.

Clearly, there are serious consequences associated with the assignment of scores to achievement levels. A difference of 10-20 points can make the difference between a school, or student, meeting or failing to meet the standard. Changes in proficiency rates can affect allocation of federal Title 1 funds, the careers of school staff, and even the value of homes in local neighborhoods.

More importantly to me, achievement levels must be carefully set if they are to provide reliable guidance to students, parents, and educators.

Standard Setting

Standard Setting is the process of assigning test score ranges to achievement levels. A score value that separates one achievement level from another is called a cut score. The most important cut score is the one that distinguishes between proficient (meeting the standard) and not proficient (not meeting the standard). For the California Math test, and for Smarter Balanced, that's the "Level 3" score but different tests may have different achievement levels.

When Smarter Balanced performed its standard setting exercise in October of 2014, it used the Bookmark Method. Smarter Balanced had conducted a field test that previous spring (described in Part 4 of this series). From those field test results, they calculated a difficulty level for each test item and converted that into a scale score. For each grade, a selection of approximately 70 items were sorted from easiest to most difficult. This sorted list of items is called an Ordered Item Booklet (OIB) though, in the Smarter Balanced case, the items were presented online. A panel of experts, composed mostly of teachers, went through the OIB starting at the beginning (easiest item), and set a bookmark at the item they believed represented proficiency for that grade. A proficient student should be able to answer all preceding items correctly but might have trouble with the items that follow the bookmark.

There were multiple iterations of this process on each grade, and then the correlation from grade-to-grade was also reviewed. Panelists were given statistics on how many students in the field tests would be considered proficient at each proposed skill level. Following multiple review passes the group settled on the recommended cut scores for each grade. The Smarter Balanced Standard Setting Report describes the process in great detail.

Data Form

For each subject and grade, the standard setting process results in cut scores representing the division between achievement levels. The cut scores for Grade 5 math, from table above, are 2455, 2528, and 2579. Psychometricians also calculate the Highest Obtainable Scale Score (HOSS) and Lowest Obtainable Scale Score (LOSS) for the test.

I am not aware of any existing data format standard for achievement levels. Smarter Balanced publishes its achievement levels and cut scores on its web site. The Smarter Balanced test administration package format includes cut scores, and HOSS and LOSS; but not achievement level descriptors.

A data dictionary for publishing achievement levels would include the following elements:

Cut ScoreThe lowest *scale score* included in a particular achievement level.
LOSSThe lowest obtainable *scale score* that a student can achieve on the test.
HOSSThe highest obtainable *scale score* that a student can achieve on the test.
Achievement Level DescriptorA description of what an achievement level means. For example, "Met Standard" or "Exceeded Standard".

Quality Factors

The stakes are high for standard setting. Reliable cut scores for achievement levels ensure that students, parents, teachers, administrators, and policy makers receive appropriate guidance for high-stakes decisions. If the cut scores are wrong - many decisions may be ill informed. Quality is achieved by following a good process:

  • Begin with a foundation of high quality achievement standards, test items that accurately measure the standards, and a reliable field test.
  • Form a standard-setting panel composed of experts and grade-level teachers.
  • Ensure that the panelists are familiar with the achievement standards that the assessment targets.
  • Inform the panel with statistics regarding actual student performance on the test items.
  • Follow a proven standard-setting process.
  • Publish the achievement levels and cut scores in convenient human-readable and machine-readable forms.


Student achievement rates affect policies at state and national levels, direct budgets, impact staffing decisions, influence real estate values, and much more. Setting achievement level cut scores too high may set unreasonable expectations for students. Setting them too low may offer an inappropriate sense of complacency. Regardless, achievement levels are set on a scale calibrated to achievement standards. If the standards for the skills to be learned are not well-designed, or if the tests don't really measure the standards, then no amount of work on the achievement level cut scores can compensate.

No comments:

Post a Comment