By the time it actually gets around to judging and ranking and sorting people, software tends to seem pretty impersonal. Objective. Bloodless. But it doesn’t always start that way.
A dating site that wants to make new matches, or an HR department trying to hire sales reps, is not measuring some simple quality, like height. They can’t just grab a yardstick. They have to piece together a story about what makes a great match, or a great sales rep, and then connect that story to concrete and measurable outcomes. Maybe it’s, “Jack and Jill exchanged phone numbers over our app, and then neither one was active on the app for six months, so we’re going to assume that they dated for a while and that the match was a good one.” Or, “Andrew stayed on the job for a year, so we think he was a good hire.”
Armed with a story like that, you can build a quantitative model to measure dating site users or job applicants, and start to predict which ones are going to hit it off, or stay on the job for a long time. You can label people and maybe even share the labels, giving out match percentages or job-readiness scores that will make the tiny muscles in the subject’s face twitch into a smile or a wince upon discovery. But without some kind of a story, the numbers aren’t much help. They can’t speak for themselves. There has to be a bridge to meaning.
Building that bridge — a story with measurable numbers on one side, and meaningful conclusions about human beings on the other — is creative work. And I’m fascinated by the creativity of it, the artistry of it.
There’s often a good story.
Here’s one.
Measuring junior historians
Driving west out of New York City, you can make it to Delbarton School in just under an hour. The school inhabits a converted gilded age mansion in Morristown, New Jersey, on a hill overlooking a nearby park. The main building, now known to students and teachers as “Old Main,” was constructed from hand-carved marble that Italian masons dug out of nearby hills in the 1880s. The school is a Catholic parochial school for boys, with annual tuition in 2020 just shy of $40,000. The teachers are a mix of laypeople and Benedictine monks from the adjoining St. Mary’s Abbey. The monks follow the Rule of St. Benedict, one of the oldest surviving forms of Christian monasticism. They wear a uniform of black robes and keep to a daily regimen of shared prayers, avoiding unnecessary conversation.
Like many highly rated schools, the curriculum at Delbarton includes Advanced Placement classes, which mirror the content of a typical undergraduate college course. When it comes to history, the top achieving boys at Delbarton are doing college level work by the time they start tenth grade — where they spend the year on AP European history — and continue on that track with AP United States History in 11th grade, and other AP’s in twelfth.
Advanced Placement courses each build up to a single, nationally standardized exam, administered in May, that covers the whole year’s material. That means students in Delbarton’s US History course, for instance, will spend three hours and fifteen minutes in the same boat with about half a million students across the world. They’ll answer multiple choice questions. They’ll write essays. And, they’ll each respond to a special kind of question that was created by the College Board — the Document Based Question, or DBQ.
The DBQ is an exercise that tries to simulate the work of actually being an historian.
Students face a subtle question of historical interpretation, and are presented with a curated set of eight or nine primary source documents — letters, political cartoons, charts and graphs — that bear on the question at hand. Then, combining their prior knowledge with evidence contained in the sample documents, students make a historical argument that answers the question and proves their ability to synthesize and analyze these documents. When I took the test in 1998, the DBQ was about whether or not the Jeffersonian Republicans, a political party from the early 1800’s, are really strict constructionists in their view of the U.S. Constitution, as the dominant view among historians has it.
Every single one of those Delbarton students — and me, when I was back in tenth grade, and even you, dear reader, if you’ve taken an AP history exam since the 1970s — owes something to Delbarton School, and particularly to one of its former headmasters, a monk and priest named Father Giles Hayes.
Father Hayes was just about as deeply connected with Delbarton as it’s possible for anyone to be. He first set foot on the campus in 1951 as a twelve year old prospective student. He was admitted, studied there, left for college and graduate school, and eventually returned as a monk. By the end of his life, he had spent nearly seventy years at the school as a student, teacher, and headmaster. His black robes suggested conservatism, and his manner was soft-spoken.
But in the fall of 1970, at the age of 31, Father Hayes was a history teacher and a new appointee to the US History Test Development Committee of the Educational Testing Service. The drive from his modest room at the Abbey to the ETS campus in Princeton would have taken about an hour — ample time for him to reflect on the mark he wished to make.
The traditional approach to US history focuses on what’s known as political history, the actions and lives of prominent political figures. That was the approach of the AP exam, which had been basically unchanged since its introduction in the 1950s. But an alternative school of thought known as “new history” promised in the late sixties and early seventies to shift the focus, giving more attention to social issues and traditionally underrepresented voices. By the same token, there was an effort to send scholars, and students, back to primary source materials, where they could develop their own interpretations that might be different from the traditional story.
Father Hayes was enamored of this approach, and had urged history teachers “to get on board with the discovery method.” In that first advisory committee meeting, he immediately challenged his colleagues by asking “if there could be a documents exercise in the AP US history exam.” As the committee considered this idea, Father Hayes proved to be a forceful and effective advocate — one of his fellow committee members, interviewed years later, called Father Hayes the “spark plug” in the introduction of the DBQ.
In developing a new type of test question, Rev. Hayes and his colleagues, together with ETS staff, had to answer a question of their own: What, exactly, were they hoping to measure?
The original AP exam had combined multiple choice with essay questions. Multiple choice was a fine way to test recall of specific historical information, and the essays were supposed to capture something else, namely students’ ability to develop and analyze historical arguments. But the essay responses that students actually gave on the exam tended simply to parrot memorized historical facts. The DBQ would change this. By focusing on a pre-selected group of primary sources, this new type of question would “test the ability of students to evaluate and synthesize historical evidence.”
The new question type was inseparable from a deeper set of philosophical commitments about what history students ought to learn, and thus what history tests ought to measure.
Father Hayes argued that students should “play the role of historian” during that hour of the exam.
For him, history was about learning how to piece together evidence from the past and make meaning out of it. (These views might have been connected to his religious training — he was a big advocate of the Benedictine spiritual practice of Lectio Divina, the direct reading of scripture as a path to knowing God.)
An alternative view, which had arguably animated earlier versions of the test, was that interpreting primary sources was a serious task requiring years of training, and that a first year college course in American history should teach students what scholars had learned from these documents. For traditionalists, the idea that anyone could be a historian seemed a tacit rebuke to the status of the profession. Father Hayes described the AP US exam as the “cutting edge” that could transform and modernize the nationwide approach to U.S. History. He wanted the new question type to drive high school history teachers to spend more time with primary source materials, and subsequent evidence — from teacher workshops that the College Board held in the mid-1970s — “made clear that the number of teachers using primary sources regularly in their classrooms had jumped exponentially as a result of the DBQ.”
From the beginning, there was a tricky balance at the heart of the DBQ.
If a good answer could rely solely on the documents presented during the exam, then the test might simply measure general aptitude for reading and reasoning — and a student might in principle ace it even if they didn’t walk into the exam room with any prior historical knowledge. On the other hand, if a good answer to the DBQ could rely mostly on knowledge that the student possessed before the exam began — perhaps together with brief references to the immediately provided materials — then the test wouldn’t really be measuring a capacity for analysis of primary sources. The ETS staffer who led creation of the DBQ put it this way:
To what extent these skills [of documentary analysis] are ‘historical’ (i.e., largely the product of exposure to historical thinking) is open to debate. Those who see substantial historical content in these skills tend to be happy with the DBQ as a testing exercise; those who do not see such content tend to be unhappy with the DBQ.
In May of 1973, students across the country were tested on the first Document-Based Question, which asked students to analyze “the factors that probably led Congress to pass the Immigration Act of 1924,” in light of eleven historical documents. In subsequent years, the number of documents averaged eighteen. And students, as it turned out, could in practice write their DBQ essays with little if any reference to outside material.
In time, the number of documents was reduced, and the scoring rubric was adapted to reward students for introducing evidence outside the documents themselves. In addition, the questions were crafted so that there was, in the words of one architect, “some sort of creative tension between the student and the documents,” where the question required interpretation of the documents beyond what was on their face, and serially quoting the documents could not make a good answer.
In short, the ongoing changes to the DBQ over the years have tried to make it so that students must use prior knowledge as a lens to interpret the documents provided on the test — a task requiring both preparation and in-the-moment skills.
I think lots of other algorithms for judging people may have their own Father Giles’s — their own “spark plugs” who build a bridge between numbers and meanings.
Some of these folks may be called data scientists, and may be best known for their technical skills. And perhaps others, like Father Giles himself, are accidental architects of algorithms. In any event, the work they do is deeply creative, and deeply human.