Here's How Not to Improve Public Schools
The Gates Foundation’s big-data experiment wasn’t just a failure. It did real harm.
2
June 27, 2018, 9:00 AM EDT
The
Gates Foundation deserves credit for hiring an independent firm to
assess its $575 million program to make public-school teachers more
effective. Now that the results are in, it needs to be no less open in recognizing just how wasteful — and damaging — the program has been.
The
initiative, known as Intensive Partnerships for Effective Teaching,
sought to improve education for low-income minority students, in large
part by gathering data and using an algorithm to assess teacher
performance. It focused on measures such as test scores, the
observations of school principals and evaluations from students and
parents to determine whether teachers were adding value. The goal:
Reward good teachers, get rid of bad ones and narrow the achievement
gap.
Laudable
as the intention may have been, it didn’t work. As the independent
assessment, produced by the Rand Corporation, put it: “The initiative
did not achieve its goals for student achievement or graduation,”
particularly for low-income minority students. The report, however,
stops short of drawing what I see as the more important conclusion: The
approach that the Gates program epitomizes has actually done damage. It
has unfairly ruined careers, driving teachers out of the profession amid a nationwide shortage. And its flawed use of metrics has undermined science.The
program’s underlying assumption, common in the world of “big data,” is
that data is good and more data is better. To that end, genuine efforts
were made to gather as much potentially relevant information as
possible. As such programs go, this was the best-case scenario.
Still,
to a statistician, the problems are apparent. Principals tend to give
almost all teachers great scores — a flaw that the Rand report found to
be increasingly true in
the latest observational frameworks, even though some teachers found
them useful. The value-added models used to rate teachers — typically
black boxes whose inner workings are kept secret — are known to be little better than random number generators,
and the ones used in the Gates program were no exception. The models’
best defense was that the addition of other measures could mitigate
their flaws — a terrible recommendation for a supposedly scientific
instrument. Those other measures, such as parent and student surveys, are also biased: As every pollster knows, the answer depends on how you frame the question.
Considering
the program’s failures — and all the time and money wasted, and the
suffering visited upon hard-working educators — the report’s
recommendations are surprisingly weak. It even allows for the
possibility that trying again or for longer might produce a better
result, as if there were no cost to subjecting real, live people to
years of experimentation with potentially adverse consequences. So I’ll
compensate for the omission by offering some recommendations of my own.
Value-added
models (and the related “student growth percentile” models) are
statistically weak and should not be used for high-stakes decisions such
the promotion or firing of teachers.
Keeping assessment formulas secret is an awful idea, because it prevents experts from seeing their flaws before they do damage.
Parent surveys are biased and should not be used for high-stakes decisions.
Principal
observations can help teachers get better, but can’t identify bad ones.
They shouldn’t be used for high-stakes decisions.
Big data simply isn’t capable yet of providing a “scientific audit” of the teaching profession. It might never be.
Let
me emphasize that unleashing such experiments on people is the most
wasteful possible way to do science. As we introduce artificial
intelligence in myriad areas — insurance, credit, human resources,
college administration — will we require the people affected to trust
the algorithm until, decades later, it proves to be horribly wrong? How
many times must we make this mistake before we demand more scientific
testing beforehand?
I’m
not an entirely disinterested observer. I have a company that offers
algorithm testing services. But I got into the business precisely
because I wanted to avert disasters like this. It’s not enough to glean
some lessons, make adjustments and
move on. For the sake of data science, and for the sake of
disadvantaged students, it’s crucial that the Gates Foundation recognize
publicly how badly it went wrong.
The Gates idiocy has destroyed a whole generation of gifted master teachers in NYC. Thousands of highly experienced educators have simply marched out the door prematurely, tired of the daily drive-by-shootings generated in evaluation systems spawned by Gates and his ilk. They have also created a generation of totally out of control administrators who rip off the taxpayer with jobs for life. I hope there is a place prepared in hell for him, his wife, and anyone who ever promoted his foundation(s) in education and other areas, including all of our recent chancellors and mayors...
ReplyDeleteOne of the central lessons I taught or tried to teach prospective teachers over my 40 year career in teacher ed was to always beware any sentence that begins "Research says...."
ReplyDeleteAfter reading the anthem for UFTers I read your piece about Bill Gates big data initiative. I tried two or three times to post a comment and then gave up. I wanted to add that the American Statistical Asssociation says that the use of student scores to evaluate teachers with Value Added Modeling IS NOT VALID. The American Educational Research Association (concerned with research methodology) says the same thing.
ReplyDeleteAm I the only one who takes this seriously? Research methodology is crucial. Sorry I could not figure out how to post a comment.