Of That

Brandt Redd on Education, Technology, Energy, and Trust

23 January 2013

Bloom's Two Sigma Problem Revisited

Benjamin Bloom's Two Sigma Problem has been both a guiding framework and a challenge to educators for more than a quarter century. A bit more than a year ago I wrote about the problem and some of the ways people are approaching it.

Here's the concise version: Bloom and some of his grad students compared classroom teaching with 1:1 tutoring. In both cases they used a mastery-based curriculum. The tutored students performed two standard deviations (two sigmas) better than their conventionally taught peers. While it would be nice to have a 1:1 student:teacher ratio, Bloom acknowledged that it's not practical and he proceeded to research ways to achieve similar results using more scalable means. He published the study in 1984. Since then, the Two Sigma Problem has served as a benchmark of how well students can learn if given the right supports.

A recent meta-study by Kurt VanLehn of Arizona State University compares no tutoring (conventional classroom), computer-based Intelligent Tutoring Systems (ITS), and human tutoring. VanLehn notes that a number of well-known ITS efforts have shown one-sigma improvements over conventional instruction. So, the conventional hypothesis is that computer tutors achieve one-sigma gains while human tutors achieve two-sigma gains as compared to conventional instruction.

VanLehn set out to test that hypothesis. He selected numerous studies that collectively yielded more than 100 comparisons between conventional instruction, three forms of ITS, and human tutoring. The result is surprising: answer-based ITS achieved an improvement of 0.31 sigma over conventional instruction. Step-based ITS achieved 0.75 sigma and human tutors achieved 0.79 sigma.

This is mixed news. On the one hand, the best computer tutors are almost as good as human tutors. That suggests that we can scale up much more effective learning than is achieved in conventional classrooms. On the other hand, VanLehn found no replication of Bloom's 2 sigma results. Is Bloom's goal out of reach or is there another factor involved?

To find out, VanLehn retrieved the dissertations from Bloom's grad students that contributed to the more famous paper. One key experiment yielded an effect size of 1.95 sigma – the probable source of Bloom's Two Sigma challenge. In that experiment both the conventional classroom and the tutors used a mastery learning technique. Whether in class or being tutored, students took a quiz after studying each unit. If their score achieved the mastery threshold, they advanced to the next unit. If not, they studied the unit more and were assessed again. This process was repeated until the mastery threshold was achieved.

The missing piece is that classroom students were required to achieve mastery threshold of 80% before advancing. Meanwhile, tutored students were required to achieve a threshold of 90%. Could it be that  adjusting the mastery threshold could account for a full standard deviation improvement in achievement? If so, numerous online learning systems should be tuned accordingly.

Oleg Bespalov and Karen Baldeschwieler, with their colleagues at New Charter University, have evidence to confirm this hypothesis. In their ITS system, students receive periodic formative assessments in the form of multiple-choice quizzes and self-graded short answer questions. From these assessments they calculate a "readiness score" to help students know when they're ready to advance. Students aren't constrained by the score – merely informed.

This creates a natural experiment in which they can compare student performance on the final exam against individual readiness scores. They discovered that students with a readiness above 90 achieved a 98% pass rate. But for those with a readiness score in the 81-90 range the pass rate dropped to 69%.

Both of these projects indicate that there's a critical threshold somewhere between 80% and 90%. Clearly this is an area deserving of more experimentation and research. But we can already tell that that tuning the mastery threshold is a critical factor for improving student achievement.

18 January 2013

Measures of Effective Teaching

The Measures of Effective Teaching Project (MET) released its final reports last week. It got considerable press coverage as the study strives to inform teacher evaluation programs, a subject of considerable controversy.

Most of the stories, like this one from Reuters, focus on the the study's finding that teacher performance can indeed be predicted by performance measures. The best evaluations involve a weighted average of student test scores, teacher observations and student evaluations. Any one of these by itself is a much less accurate predictor.

There are nuances to this that can be gleaned from the project's Policy and Practitioner Brief:

  • The different measures (student testing, teacher observation and student evaluation) have some overlap but mostly they measure different aspects of the teacher's skills.
  • Different weightings are better predictors of different outcomes. Unsurprisingly, placing greater weight on test results is a better predictor of future student test results. However, equal weighting models or those that emphasize teacher observations are more reliable year over year.
  • Effective teacher observations are more than a periodic visit from the principal. Evaluations require a consistent framework and procedure. The MET project used the Danielson Framework for Teaching as a rubric. The reliability of teacher observations is greatly improved by having at least two evaluators.
  • When done properly, student evaluations are very reliable and an important component of teacher evaluation. As with observations, the key is to ask the right questions. The MET project used the Tripod Student Survey.
  • The "value added" theory is supported. When student scores are compared with the previous year's performance (a value added score) the result is a more consistent predictor of future teacher performance than just the most recent year's scores.
One problem with exclusively using standardized tests to evaluate teachers or schools is that it's a blunt instrument. These tests offer a measure of performance but they offer limited guidance to a teacher or school on how they can improve. Sure, we can fire ineffective teachers and close ineffective schools. But using natural selection to improve schools is slow and costly not to mention cruel. Basically you're just hoping for those teachers and schools that randomly find the right formula for success.

Among the advantages of teacher observations and student evaluations are that they supply rich feedback to teachers to help them improve their practice. Another MET project report, Feedback for Better Teaching, offers guidelines for using feedback. They placed cameras in classrooms and observers codified the techniques used by the teachers. The same video recordings were used by the teachers themselves to observe their own performance – usually with an instructional coach. Processes like these can continuously improve teacher skills and effectiveness.

I've written before about how immediate feedback can help the student learn more effectively. In that context, it's no surprise that feedback to teachers helps them to be more effective. Moreover, it supports the passion that got them into teaching in the first place.

09 January 2013

Enterprise-Scale is not Web-Scale

In the 90s, the IT world was talking about Enterprise-Scale. It's not that enterprise-scale was anything new. But up until then, the enterprise was the domain of mainframe and minicomputers. Upstart microcomputers – those with the whole processor on a chip – had previously not been capable of enterprise-scale operations.

It took a lot to achieve enterprise-scale with microcomputers. The leaders included Sun, Oracle and Cisco with Microsoft, Intel, Compaq and others playing fast-follower. They invented RAID arrays, symmetric multiprocessing, storage area networks, network load balancing and much more in the pursuit of five nines of reliability.

As microprocessor-based computers achieved enterprise-scale, pioneers like Google, Amazon, Yahoo!, SalesForce and others pushed right past enterprise into web-scale. User counts were measured in hundreds of millions, storage capacity was measured in petabytes and server arrays numbered in the thousands. Unlike enterprise-scale where key technologies had already been invented by the mainframe world, there wasn't any precedent for web-scale and the pioneers had to invent their own methods. I happened to work at Ancestry.com in the late 1990s/early 2000s and got to participate in some of that invention. But it wasn't until later in the decade before the pioneers started to share what they had learned and build products like Amazon Web Services, Google App Engine and Windows Azure to support the web-scale developer.

This is a important issue for education technology. The education industry is a bit behind the curve in moving to web-scale. For example, most learning management systems are built for enterprise-scale. They are intended to be installed on dedicated servers at a college or university's data center and they're architected to handle tens of thousands of students and teachers.

What happens when you move to the K-12 market or to community colleges? These organizations don't have the data centers and skilled staff needed to deploy, maintain and backup enterprise servers like these. In the past, their data systems only had to handle a few hundred or maybe a few thousand administrators. Teachers and students didn't directly access the district's data systems.

But districts are rapidly bringing all of their students and teachers online. And that means two orders of magnitude more users. Many districts have student counts numbering in the hundreds of thousands. Some get into the millions. And since their technology staffs are already overburdened, they seek hosted solutions, not enterprise-scale servers they have to manage themselves. Hosting a single district might not reach web-scale but a cost effective provider would serve hundreds of districts. And web-scale technologies can reduce the cost to something that districts can afford.

Here are some of the principles of web-scale architecture. For purchasers of products and services, these are the things you need to look for. For developers of those services, these are the principles you need to incorporate into your design.

Always Available
Web scale services use redundant servers to ensure that the service is always running – even during software upgrades and system maintenance. The term "24/7" was invented to describe services that have no weekly scheduled downtime. (And please don't write 24/7/365.)

Billions of Database Records
Today a district might keep a few dozen data points per student per year. In a district with 50,000 students that amounts to around 150,000 database rows per year. Eventually the database might grow to a few million rows total.

But personalized learning applications can collect thousands of data points per student per year. And an online service might serve a few million students. Thus, a web-scale learning service should be designed to store billions of data elements with provisions for orders of magnitude growth beyond that as clickstream data become more important.

Single Sign-On and Identity Management
Today's schools typically run a Student Information System, a Learning Management System and a few custom learning systems for specific subjects. Most of these applications have their own user database mandating separate logins and making requiring a lot of data entry to provision the sytems.

The low-hanging fruit is a single sign-on system that lets students and teachers use the same login account across all systems. But single sign-on is of limited value without automatic provisioning. So it's more important to have an identity management system that automatically shares demographic and enrollment information between the Student Information System and the various learning systems. The long-term need is to integrate the data among all of the systems so that all student performance data is accumulated in a common database.

Services-Integrated Security
Consider security in a Student Information System. The student, her teacher and her parents should have access to her school records, but no-one else (except maybe the principal or a counselor). Enterprise-scale security manages things like this through access control lists (ACLs). Record or element has an ACL granting certain levels of access to specific individuals. For example, the teacher has permission to view and change grades while the student and parent only have permission to view them.

The ACL approach becomes fragile at web scale. With millions of students and parents and thousands of changes to class enrollment it becomes difficult to maintain correct ACLs even when the process is automated. Roles and groups offer some relief but inevitably permission errors creep in and they become a technical support nightmare. Even worse, with regulations like FERPA in place, permission errors can result in significant liability.

Instead, web-scale applications use policy-based permissions. When a student is enrolled in a class, the policy says the teacher should be able to access that student's records. There's no ACL to be updated and permission naturally disappears if the student changes enrollment. The databases of these systems describe the relationships between elements (students, classes, teachers, parents, etc.) and the policies describe how permissions should be granted according to those relationships.

Services-integrated security also means that permissions are enforced at all levels of the system. The UI will control permissions that are offered to the user and the API enforces policy when read or write attempts are made. Thus a rogue or buggy application is still prevented from violating security policy.

Developer Note: Policy-based security can be processor and database intensive. The query to determine whether to permit a particular operation can easily be more expensive than the operation itself. This isn't a reason to reject the approach. Rather, use multiple levels of permissions caching to reduce the database burden.

Linear or Sub-Linear Cost Curves
If you graph number of users on the horizontal axis vs total cost on the vertical axis enterprise-scale systems have costs that grow exponentially. This is because they achieve scale by using progressively bigger and more complex servers and one 32-processor server costs many times more lot more than 32 single-processor servers.

In contrast, web-scale architectures have a linear or sub-linear cost curve. They achieve this feat by using software, database and hardware architectures that spread the load across many commodity-priced servers. As demand increases you simply add servers so variable costs are linear and fixed costs get diluted over a large number of users.

Developing scale-out software like this is complicated and expensive. Because of this, enterprise-scale architectures can cost less in enterprise-scale deployments. But when user counts get into the hundreds of thousands or millions, web-scale becomes more cost effective.

Web-scale applications rarely stand alone. In most cases, they are combined with other applications to create a complete solution. This is certainly true of our vision for personalized education. A complete solution includes at least the following:

  • Student Information System
  • Student/Parent Portal
  • Teacher Portal
  • Adaptive Learning System (probably multiple subject-specific ones)
  • Content Library
  • Assessment Bank
  • Analytics (Teacher, Department, Schools)
  • Interactive Professional Development
Plus, other innovative applications are likely to emerge. In a realistic system, these components will originate from a variety of sources. For time-constrained teachers and students to use them effectively, they will have to be seamlessly mashedup together.

Tools and Techniques for Web-Scale
At Ancestry.com we had to invent many of the web-scale tools we used. But the toolkit has matured in the last few years. The easiest approach is to build on one of the cloud platforms like Windows Azure, Google AppEngine or Amazon Web Services. The downside is that doing so locks you into that vendor's hosting service.

Here are some of the other tools and techniques that help in web-scale development:
Ultimately, however, you can use all of these tools and still not have a web-scale application. You have to architect for web-scale from the very start.