Of That

Brandt Redd on Education, Technology, Energy, and Trust

30 November 2012

Learning from Data - An Automotive Example

Monday saw me driving 800 miles home from a family Thanksgiving celebration. Due to my wife's change in plans, my only companion for the drive was our small dog (who had a narrow escape the day before). I needed something to keep my attention. So I decided to perform an experiment in data collection. I learned a lot even from a small data sample.

The vehicle I was driving was a 2010 Subaru Forester. Some friends have the same vehicle and have been pleased with getting around 27 MPG on the highway. We typically get only 23-24 MPG on the highway and I had been wondering why.

Among the features of this car is an average gas mileage display that's tied to the trip odometer. So, sampling the gas mileage is as simple as setting the cruise control, resetting the trip odometer, driving a set distance and reading out the result. As I was crossing the relatively flat plains of Idaho (speed limit 75) this seemed to be a good opportunity to gather some data.

Over a period of several hours, I took a bunch of samples following the above method and using my GPS to track altitude changes. I abandoned samples where the altitude change was more than a few hundred feet. The result is 27 good samples. I've posted the raw data here in case you want to play with them. Most of the samples are for 20 mile segments but some are as long as 40 and some are as short as 5 miles.

As you can imagine, the lower-speed samples got a bit tedious. But I was curious enough that I even took a side trip on a remote road (off the freeway) to get samples below 45 MPH. There's considerable variability in those results as you can see in the plot below. Halfway through the trip I refilled with fuel. I switched from regular (87 octane) to premium (92 octane) to see how that might affect mileage.

2010 Subaru Forester Fuel Economy vs. Speed
The results certainly aren't what I expected. EPA city and highway ratings have always led me to expect relatively flat miles per gallon with city being much poorer due to stop-and-go driving. Instead, I got a nearly linear downward slope. Using Excel's curve-fitting feature and found that a polynomial curve worked better than a line. The formula is embedded in the graph above.

More data points would be required to really validate this curve but it certainly fits within the margin of error of my samples. Therefore we can make some cost estimates using this formula. Notable is that peak economy is between 40 and 45 MPH – much slower than I had expected. From this I was able calculate the cost of each hour saved on this long drive.

Here's a table of results for a 800 mile drive in a 2010 Subaru Forester with average fuel cost of $3.759 per gallon. Time saved and additional cost are from a baseline speed of 55 MPH. Note that the distance is cancelled out in the Cost Per Hour Saved so those numbers are accurate regardless of the length of the trip.

SpeedMPGFuelCostTime  Hrs
Saved
Addl
Cost
Cost Per
Hr Saved
5533.1$90.8314.55
6032.1$93.7113:331.21$2.89$2.38
6530.7$97.8812.312.24$7.05$3.15
7029.0$103.6511.433.12$12.82$4.11
7527.0$111.5510.673.88$20.72$5.34
8024.6$122.4510.004.55$31.62$6.96
8422.4$134.319.525.02$43.48$8.66

Here are a few things I've learned from this:

  • My friend with the other Forester drives slower on the highway than I do.
  • I had not known how sensitive vehicle gas mileage is to speed.
  • Everything I have read led me to expect no benefit from higher octane fuel once the vehicle's requirements have been met. In the case of the Forester, higher octane actually reduced fuel economy. This observation is confirmed by the official EPA ratings.
  • I would love to see tables like the one above before purchasing my next car.
I learned a lot from this tiny data sample and my future driving habits will be changed accordingly. Now imagine what we could learn if there were a large public database of fuel economy data. Car manufacturers could optimize for specific driving patterns. Consumers would be better informed about fuel economies to expect. There are fleet tracking devices like this one that are already reporting that data but it's locked up in private databases. If anonymous fuel economy data (speed, distance, altitude and MPG) were released there's a lot we could learn about fuel economy under a variety of conditions.

I can't wrap this post up without relating it to education. A relatively small data sample taught me a lot and will impact my future driving behavior. In the same way, it doesn't take a lot of data fed back to students and teachers before they see opportunities to improve. And when we grow from little data to big data, revolutionary changes are on the horizon.

19 November 2012

A Post-LMS Framework for Personalized Learning

In the last few weeks I presented at iNACOL VSS and attended Educause. I’ve also met with the Shared Learning Collaborative team and the CEDS Stakeholders group. Educause included meetings with the Next Generation Learning Challenges organizers and grantees. All in all it’s been a concentrated opportunity to meet with vendors, standards developers and visionaries in the personalized, blended and online learning spaces.

There’s a new pattern emerging on how the technical components of a personalized learning system fit together and it’s a departure from the past. This model seems to apply both in K-12 and postsecondary education.

This new framework is being driven by three trends:
  • Innovative creators of courseware and learning systems need greater control over the learning environment than can be achieved in a Learning Management System (LMS).
  • Student Information Systems (SIS) and Portals are taking over responsibility for student/teacher communications, gradebooks and consolidating analytics into student and teacher dashboards.
  • Students and Teachers are seeking a coherent and seamless experience without separate credentials and logins for each of the systems they use.

A New Framework

The figure below shows the interaction of three systems. Each system may be hosted by a different provider but they’re integrated in such a way that the student should browse between them seamlessly.
The Student Information System (SIS) is generally integrated into the school’s portal. This is the site a student browses for consolidated access to all school information. It’s provided and managed by the school. The portal links to courses in which the student is enrolled.

In this new model, courses are an integrated experience delivered by learning systems custom-adapted to the subject matter. At a basic level, a course is a sequence of learning and assessment activities such as exposition (video, audio, text), virtual labs, exercises, quizzes exams and so forth. Key to personalization is that the selection and order of activities is adapted according to individual student needs.

Traditionally, the same learning system that hosts the course also hosts the activities. This is reasonably simple with conventional media types such as text and video. It gets more complicated with interactive media and assessments. The most innovative learning activities may be separately hosted because they are supported by custom services. These could include interactive labs, intelligent tutoring systems, virtual worlds and games.

Conspicuously absent in this new model is the Learning Management System (LMS). For the last decade or so, the framework has been that schools select and deploy an LMS – ideally with single sign-on and data integration with their SIS – but all too often as an independent system. The idea was that courseware publishers and instructional designers would install the course materials into the LMS using content packaging formats like SCORM and Common Cartridge. But this hasn't happened very much – especially with the most innovative courses. Cutting edge learning systems like DreamBoxAleks or Read 180 can’t be packaged up and installed into an LMS. The environment is too constraining.

While LMSs are capable of much more, most actual LMS use is in support of teacher-student and student-student communications, not for delivery of instruction. And that communication function is being taken over by the SIS and portal. Contemporary SIS systems have expanded beyond enrollment and course-level data to include full gradebook functionality. Meanwhile, portals are including teacher and student dashboards, online forums, chatrooms and other communication features.

So the new model is composed of Portal/SIS, Learning Systems and Activities often supplied by different organizations. And it’s not just three systems that need to be integrated. A single school will likely have many learning systems. A single student is likely to use different learning systems for different subjects. And a single course may integrate activities from a variety of sources.

Student ID

In order for the student and teacher experiences to be coherent there needs to be a clean handoff between these systems. In the diagram I've shown this as Student ID flowing to the right and Student Data flowing both ways. Student ID may include authentication, authorization and/or provisioning.
  • Authentication, often provided by Single Sign-On (SSO) is the real-time indication of who the student is.
  • Authorization is a real-time assertion that the student should be granted access to a system or resource.
  • Provisioning is the transfer of teacher and student enrollment data so that a learning system or activity can grant access and coordinate a cohort of teachers and students. This may on-demand (coordinated with authentication or authorization) or it may be a periodic batch update.
Depending on features of each component, these work together in different ways. For example, an SIS may transfer provisioning data to a learning system. Then, at runtime the SIS uses an SSO protocol to authenticate the student to the learning system. At this point the learning system knows the identity of the student and the provisioning of the classes, therefore it can internally decide whether to authorize access.

On the other hand, the learning system may use an authorization protocol to grant student access to a learning activity without authentication or provisioning. In this case, the activity provider doesn't know the identity of the student, it only knows that a trusted agent (the school) has indicated that the student should be granted access.

Student ID protocols can transfer three levels of information depending on the needs of the systems:
  • Personally Identifiable Information (PII): This might include the students name, grade, enrollment information and so forth. It's sensitive information governed by FERPA regulations.
  • Persistent Identifier: This is just enough information that a learning system or activity can identify repeat visitors. The system doesn't have any personal information about the student but knows this is the same student as in a previous visit.
  • Authorization Ticket: This is just a trusted indication that a student should be able to access content. The learning system or activity is not assisted in coordinating repeat visits.

Student Data

Most of the student data flow is upstream as student activities and performance are reported to the Learning System and the SIS/Portal. Traditionally that data has been simple scores and grades. But systems are beginning to collect richer information like frequency of access, time on task and clickstream data. these are used in analytics such as student and teacher dashboards. This same data can also be reported downstream for the use of adaptive learning systems and custom activities.

Protocols

The difficulty is that there isn't much consistency in the protocols used for Student ID or Student Data. To their credit, the builders of SISs, Learning Systems and Tools all have APIs for integration with other systems. But in most cases APIs are custom to the application. And upstream systems aren't necessarily prepared to collect the rich data that downstream systems are prepared to share.

Here's a survey of what is available or under development:

SAML and OAuth are two commonly-used protocols for authentication and authorization. The SSO subset of SAML has become common due to its use by Google Apps. OAuth is an authorization protocol that can optionally carry personal information or a persistent ID according to needs. Shibboleth is an open source reference implementation of SAML.

IMS Learning Tools Interoperability (LTI) supports the interaction of Learning Systems and Activities. It incorporates OAuth for the authorization step. LTI v1.0 (also known as Basic LTI or BLTI) coordinates the authorization of the activity (called a Learning Tool) seamlessly embedding it in the Learning System. Later versions of LTI support reporting of simple student performance data. Most mainstream LMSs support LTI 1.0 or better.

LearnSprout and Clever are two companies supporting data integration with SISs. This allows builders of Learning Systems to write to one API (either LearnSprout's or Clever's) and gain integration into a number of prominent SISs. However, they are limited to the data types supported by the SIS.

The Shared Learning Collaborative (SLC) is building a web-scale common student data layer that can be used by the SIS, Portal, Analytics, Dashboard, Learning System and Activities. A rich set of data types is pre-supported and applications can store custom data for persistence and sharing. It also supplies a common student identity framework including authentication services. So, in the SLC instead of handing off student data between systems, they all rely on the same underlying service.
The SLC Approach to the New Model
The new model divides the functions once concentrated in the LMS. Today, custom systems integration must be done to achieve a seamless experience. But protocols and services are under development that should simplify that in the future.

17 November 2012

Video: Feedback Loops for More Effective and Personalized Learning

Last month I presented at the iNACOL VSS conference. I posted my slides and resources here.
I experimented with using a Bluetooth microphone and PowerPoint's recording feature to generate a voiceover video of the presentation.


I apologize in advance. The audio quality is fairly poor. It's especially bad at the beginning but improves later. I think I was near the range limit of the microphone. And PowerPoint's recording/video feature is still buggy. In a couple of slides, the sound drops out entirely.
Flaws aside, I'm pleased with how the subject came together. Quality feedback loops are a key component in personalized learning solutions. In researching this topic I found a lot of relevant research that can guide the development and selection of products.

06 November 2012

Election Technology Update

It's election day in the U.S. and most of us are fatigued by the campaigns and will be relieved to have them over. Barring an electoral college anomaly there will be more voters who are pleased with the result than dismayed by it (it's a tautology).

I wrote about my misgivings with touchscreen direct entry voting systems in 2009. Things haven't improved since then. The big risk is indetectable vote manipulation. Of course all voting systems, whether electronic or paper, are subject to some form of manipulation. The key is to set up protocols so that manipulations leave evidence. For example, paper balloting systems often count the number of ballots cast and compare that with the number of ballots counted. The number of ballots cast is transmitted to the counting location through a different means from the transmission of the ballots themselves.

In 2010 I wrote about King County, Washington's vote-by-mail system. In addition to mailing ballots, voters can deliver them to dropboxes conveniently located around the county. Not only do they save postage, dropboxes appear to be a more secure delivery method as observers from both major parties watch the sealing and collection of ballot boxes. Other observers watch the opening and counting processes.

As a paper and optical scan method, King County's is among the more secure – once the ballot is delivered to a dropbox. The glaring weakness is privacy. Vote-by-mail opens the door to voter coercion because there's no inspector and booth to ensure privacy when the vote is actually cast. It's entirely possible for others to pre-fill the ballot and simply ask the voter to sign – with intimidation if necessary.

Of course, manipulation of this sort doesn't scale well. Sure it can happen in pockets but widespread, coordinated vote manipulation would be hard to achieve as the more voters are intimidated, the greater the likelihood that someone complains. Therefore it's reasonable to assume that deliberate manipulation will be a small fraction of total votes cast.

This leads to an interesting conclusion: Though we aspire to make every vote count, there's some degree of error regardless of the way votes are cast and counted. Sometimes it's deliberate fraud, manipulation or intimidation. Sometimes it's poorly designed ballots, miscalibrated voting machines or natural disasters. There are two ways to deal with this. Our current system presumes that if the difference in votes is within the margin of error, democracy is preserved regardless of which candidate takes office. That presumption was tested in the 2000 U.S. election.

The alternative is to require another election if the vote is within the margin of error. That approach carries a tremendous cost in terms of time, money and extended uncertainty. Despite misgivings, I have to agree with those who wrote the constitution. I may not like the outcome when the vote is close. I may even believe that the count is inaccurate. But I do believe that Democracy is preserved.