Doing Data & Education Well

Done correctly, big data can make education better. Pritom Das just posted three significant changes in education policy and practice driven by data innovation. They include personalization (tracking student behaviors and preferences to develop customization technology), increased participation, and better measurement of performance. I think the most promising of those is participation. Like personalization, the idea is to develop platforms that customize with (not just for) the student. 

The most concerning and complicated of these is surely performance assessment. It has the most potential for unreflective, mostly quantitative assessment of student performance. If the collection of data is especially cold and impersonal in certain contexts, it doesn’t matter whether the data is being accumulated slowly or quickly. 

The motive of many education policymakers is undoubtedly to use performance data to improve learning systems, but each factor, including temporal questions like how long students take to answer test questions, how many times they read a text or watch a video, or whether they go over learning materials many or few times, obscures why different students would choose to do these things. I may go back over the material multiple times for reasons other than a deficit in immediate understanding; I may gain a deeper, more synthesized understanding precisely because I go over something more than once intentionally.  

It’s really a matter of what questions are asked and who is studying the answers, of course, but too much emphasis on how long it takes a student to learn something (honestly, do we care about temporal efficiency that much? are we that Fordist?) risks less attention being paid to questions like social class: where mountains of research (enhanceable by good metadata, of course) demonstrates that “children’s social class is one of the most significant predictors—if not the single most significant predictor—of their educational success.” The research also shows that these class differences affect kids at the earliest stages of education and then “take root” and “fail to narrow” over time⁠—creating large gaps in cognitive and noncognitive development that translate into hundreds of thousands of dollars and countless quality of life factors. It’s really a terrifying indictment of the American education system. 

The concern is that because of the emphasis on speed of learning or the desire to complete steps, lots of otherwise good educational advocates stop asking tough questions about data-driven education. Lots of otherwise good outcomes may also be suppressing individuality, papering over class, race, or gender inequality, not meeting kids’ and families’ needs. 

These concerns, and a few more, were expressed in a post last summer by Rachel Buchanan and Amy McPherson at the Australian site EduResearch Matters provocatively titled “Education shaped by big data and Silicon Valley. Is this what we want for Australia?” The tying of potentially beneficial technology to a particular business interest (whatever that interest) evokes a frustration that we have lost a compass for tech serving the common good. The authors point out that the “products” being introduced measure “not just students’ learning choices but keystrokes, progression, and motivation. Additionally, students are increasingly being surveilled and tracked.” The post quotes a poem by education blogger Michael Rosen:

“First they said they needed data

about the children

to find out what they’re learning.

Then they said they needed data

about the children

to make sure they are learning.

Then the children only learnt

what could be turned into data.

Then the children became data.”

I think many folks I know who think about big data a lot would like to see a world where we used it to improve education in the right ways and not think so much about students en masse in these kinds of cause-and-effect relationships. One solution, given that the data genie isn’t going back into the bottle (and has the potential to help fight inequality while also building individuality) is to teach students precisely what’s happening to them, to pull back the curtain, to show them the gears and scaffolding of education policy itself⁠—as well as its quantitative assessments. I mean things like teaching middle school students, for example, how AI works ideologically, not just technically. This is the focus of a suggested curriculum outlined by a professor and two graduate students at MIT, “Empowering Children through Algorithmic Justice Education.” 

The proposal calls AI education “an issue of social justice – it is not enough to teach AI or machine learning as a technical topic.” It cites the findings that “neutral” data can actually be biased, requiring the teaching of ethics “in conjunction with AI as a technical topic.” The question to be asked: who are we building this technology for? Ongoing efforts to examine and develop industry ethics in quantitative data are also important and encouraging.

Whether you feel like data should be idealized to be objective and neutral, or seen as reflective of human biases rather than overcoming them, watching videos like this will make you think about what kind of world we want to build with the literal quantum leaps we’re making in the field. 

And as you might imagine, such ethical and social questions also haunt the use of data in political and issue campaigns, including the ways we append that data with additional information using vendors like our client Accurate Append, an email, and phone contact data quality vendor. We should always be self-reflective⁠—and other-reflective⁠—in the way we ask even seemingly neutral demographic or profile questions.