Of That

Brandt Redd on Education, Technology, Energy, and Trust

23 January 2013

Bloom's Two Sigma Problem Revisited

Benjamin Bloom's Two Sigma Problem has been both a guiding framework and a challenge to educators for more than a quarter century. A bit more than a year ago I wrote about the problem and some of the ways people are approaching it.

Here's the concise version: Bloom and some of his grad students compared classroom teaching with 1:1 tutoring. In both cases they used a mastery-based curriculum. The tutored students performed two standard deviations (two sigmas) better than their conventionally taught peers. While it would be nice to have a 1:1 student:teacher ratio, Bloom acknowledged that it's not practical and he proceeded to research ways to achieve similar results using more scalable means. He published the study in 1984. Since then, the Two Sigma Problem has served as a benchmark of how well students can learn if given the right supports.

A recent meta-study by Kurt VanLehn of Arizona State University compares no tutoring (conventional classroom), computer-based Intelligent Tutoring Systems (ITS), and human tutoring. VanLehn notes that a number of well-known ITS efforts have shown one-sigma improvements over conventional instruction. So, the conventional hypothesis is that computer tutors achieve one-sigma gains while human tutors achieve two-sigma gains as compared to conventional instruction.

VanLehn set out to test that hypothesis. He selected numerous studies that collectively yielded more than 100 comparisons between conventional instruction, three forms of ITS, and human tutoring. The result is surprising: answer-based ITS achieved an improvement of 0.31 sigma over conventional instruction. Step-based ITS achieved 0.75 sigma and human tutors achieved 0.79 sigma.

This is mixed news. On the one hand, the best computer tutors are almost as good as human tutors. That suggests that we can scale up much more effective learning than is achieved in conventional classrooms. On the other hand, VanLehn found no replication of Bloom's 2 sigma results. Is Bloom's goal out of reach or is there another factor involved?

To find out, VanLehn retrieved the dissertations from Bloom's grad students that contributed to the more famous paper. One key experiment yielded an effect size of 1.95 sigma – the probable source of Bloom's Two Sigma challenge. In that experiment both the conventional classroom and the tutors used a mastery learning technique. Whether in class or being tutored, students took a quiz after studying each unit. If their score achieved the mastery threshold, they advanced to the next unit. If not, they studied the unit more and were assessed again. This process was repeated until the mastery threshold was achieved.

The missing piece is that classroom students were required to achieve mastery threshold of 80% before advancing. Meanwhile, tutored students were required to achieve a threshold of 90%. Could it be that  adjusting the mastery threshold could account for a full standard deviation improvement in achievement? If so, numerous online learning systems should be tuned accordingly.

Oleg Bespalov and Karen Baldeschwieler, with their colleagues at New Charter University, have evidence to confirm this hypothesis. In their ITS system, students receive periodic formative assessments in the form of multiple-choice quizzes and self-graded short answer questions. From these assessments they calculate a "readiness score" to help students know when they're ready to advance. Students aren't constrained by the score – merely informed.

This creates a natural experiment in which they can compare student performance on the final exam against individual readiness scores. They discovered that students with a readiness above 90 achieved a 98% pass rate. But for those with a readiness score in the 81-90 range the pass rate dropped to 69%.

Both of these projects indicate that there's a critical threshold somewhere between 80% and 90%. Clearly this is an area deserving of more experimentation and research. But we can already tell that that tuning the mastery threshold is a critical factor for improving student achievement.


  1. So It would seem that everyone's been aiming to the wrong goal (replicating one-one tutoring) when perhaps the a higher mastery threshold is the magic pill? I like this answer so much better, honestly. It's just a requirement to actually learn a subject before moving on.

    Were Blooms experiments in many subjects or just one?

  2. It took both 1:1 tutoring AND a 90% threshold to achieve two-sigma improvements. Most of us in the Ed Tech field us Bloom's tutoring as a proxy for personalized instruction - adapting teaching to meet individual needs. Notably, mastery learning also some degree of personalization.

  3. Hi Brandt,

    I've done a lot of study of historical child prodigies (Mozart and Beethoven, Picasso, the Bronte sisters to name a few) and regularly practice reproducing the methods and processes behind their educations. I stumbled across the 2-sigma problem a few months ago and have had it come up as a recurring theme ever since.

    The theme, though, is that from my perspective, all the research and practice I do, is that a 2-sigma improvement is actually a low threshold compared to the students I study... and unknowingly (until recently, at least) the methods used by the teachers and students in my case studies are quite able to be reproduced and, like Bloom's results indicated, show that student performance has a lot more to do with the conditions of the learning environment than the student. To me, that seems to be true even in the case of the highest performers, or "prodigies."

    I'd love to chat about it sometime. :)

  4. Thanks Zach,

    I've seen other evidence confirming your assertion that we can do considerably better than 2-sigma. One researcher said we really don't know the limits of a child's capacity to learn. The challenge in each case is figuring out how to scale their methods to address a majority of students.

    It sounds like you're on to something. I would enjoy chatting. You can contact me directly on Google+ or LinkedIn.

  5. Is there a paper about the experiment by Oleg Bespalov and Karen Baldeschwieler?

  6. No, unfortunately the New Charter University results aren't published anywhere. I obtained the information in a conversation with Bespalov and Baldeschwieler shortly before I wrote this post.