Studying the social world requires more than deference to data. In some cases, it may even require that we reject findings—no matter the prestige or sophistication of the technical apparatus on which they are built.
In 2016 economist Roland G. Fryer, Jr., the youngest African American ever to be awarded tenure at Harvard, came upon what he would call the “most surprising result of my career.” In a study of racial differences in the use of force by police officers, Fryer found that Black and Hispanic civilians were no more likely than white civilians to be shot to death by police. “You know, protesting is not my thing,” Fryer told the New York Times. “But data is my thing. So I decided that I was going to collect a bunch of data and try to understand what really is going on when it comes to racial differences in police use of force.”
The entwining of data and theory runs through any application of quantitative methods, but it is especially fraught today in the study of race.
Three thousand hours later, after meticulous records collection and analysis, the data had spoken. Although Black people are significantly more likely to experience non-lethal force at the hands of police than white people in similar situations, Fryer concluded, there is no racial bias in fatal police shootings. The findings appeared to have direct implications for the growing protest movement that had swept across the United States following the police killings of Michael Brown, Eric Garner, Tamir Rice, Freddie Gray, Philando Castile, among so many others. “It is plausible that racial differences in lower-level uses of force,” Fryer wrote at the end of the paper, “are simply a distraction and movements such as Black Lives Matter should seek solutions within their own communities rather than changing the behaviors of police and other external forces.”
The study quickly came under fire. For the most part, critics took one of two tacks. One was to argue that the research failed on its own technical terms: the data were erroneous or misleading; there was a mathematical error in the analysis; the statistical protocol was inappropriate. The other tack was to undermine the legitimacy of the effort on auxiliary grounds, pointing out that economists are not experts in the study of police shootings and that the profession of economics suffers from a conservative bias. These two types of reply exemplify a common pattern of response to the results of quantitative social science, and they are also once again on wide display in national conversations about policing. But they illustrate a deep problem with the way we think about the nature of social scientific inquiry—and, consequently, its capacity to inform our thinking about politics and policy.
On these two views, scientific method is either so airtight that only errors from within can undermine it or so porous that its validity turns entirely on outside interests. There are certainly cases of each kind of blunder: recall the Reinhart-Rogoff Excel spreadsheet snafu on the one hand, tens of millions of dollars funneled by ExxonMobil to fund climate change denialism studies on the other. We misunderstand run-of-the-mill scientific practice, however, if we view it as either everywhere or nowhere settled by data. As historians and philosophers of science have long emphasized, “the data” can never take us all the way from observation to conclusion; we can interpret them only against some background theory that settles what the data are evidence of. Far from playing no role in quantitative social science, a shared set of theoretical and normative commitments is what allows data-first methods to work at all.
This entwining of data and theory runs through any application of quantitative methods, but it is especially fraught today in the study of race. Since the 1970s, the development of causal inference methodology and the rise of large-scale data collection efforts have generated a vast quantitative literature on the effects of race in society. But for all its ever-growing technical sophistication, scholars have yet to come to consensus on basic matters regarding the proper conceptualization and measurement of these effects. What exactly does it mean for race to act as a cause? When do inferences about race make the leap from mere correlation to causation? Where do we draw the line between assumptions about the social world that are needed to get the statistical machinery up and running and assumptions that massively distort how the social world in fact is and works? And what is it that makes quantitative analysis a reliable resource for law and policy making?
In the tides of latest findings, what we should believe—and what we should give up believing—can never be decided simply by brute appeals to data.
In both academic and policy discourse, these questions tend to be crowded out by increasingly esoteric technical work. But they raise deep concerns that no amount of sophisticated statistical practice can resolve, and that will indeed only grow more significant as “evidence-based” debates about race and policing reach new levels of controversy in the United States. We need a more refined appreciation of what social science can offer as a well of inquiry, evidence, and knowledge, and what it can’t. In the tides of latest findings, what we should believe—and what we should give up believing—can never be decided simply by brute appeals to data, cordoned off from judgments of reliability and significance. A commitment to getting the social world right does not require deference to results simply because the approved statistical machinery has been cranked. Indeed in some cases, it may even require that we reject findings, no matter the prestige or sophistication of the social scientific apparatus on which they are built.
• • •
An instructive object lesson in these issues can be found in a controversy ignited last summer, when a paper published in the American Political Science Review (APSR) in late May questioned the validity of many recent approaches to studying racial bias in police behavior, including Fryer’s. A very public skirmish among social scientists ensued, all against the backdrop of worldwide protests over the murder of George Floyd by Minneapolis police officer Derek Chauvin.
The APSR paper focused, in particular, on the difficulties of “studying racial discrimination using records that are themselves the product of racial discrimination.” The authors—Dean Knox, Will Lowe, and Jonathan Mummolo—argued that
when there is any racial discrimination in the decision to detain civilians—a decision that determines which encounters appear in police administrative data at all—then estimates of the effect of civilian race on subsequent police behavior are biased absent additional data and/or strong and untestable assumptions.
The trouble, in short, is that “police records do not contain a representative sample” of people observed by the police. If there is racial bias reflected in who gets stopped and why—and we have independent reason to believe that it does—then police data for white and non-white arrestees are not straightforwardly comparable without making additional implausible or untestable assumptions. Such “post-treatment bias” in the data would thus severely compromise any effort to estimate the “true” causal effects of race on law enforcement behavior, even if we are only interested in what happens after the stop takes place. “Existing empirical work in this area is producing a misleading portrait of evidence as to the severity of racial bias in police behavior,” the authors conclude. Such techniques “dramatically underestimate or conceal entirely the differential police violence faced by civilians of color.” The authors therefore call for “future research to be designed with this issue in mind,” and they outline an alternative approach.
Where do we draw the line between assumptions about the social world that are needed to get the statistical machinery up and running and assumptions that massively distort how the social world in fact is and works?
A critical response by several other scholars—Johann Gaebler, William Cai, Guillaume Basse, Ravi Shroff, Sharad Goel, and Jennifer Hill—appeared a month later, in June; for simplicity, call this group of scholars the second camp. Disputing the APSR authors’ pessimistic assessment of research on racial bias in policing, they countered that the APSR paper rested on a “mathematical error.” The usual methods could still recover reliable estimates of the causal effect of race on law enforcement behavior after a stop has been made, even if police stops are themselves racially biased. The error, they assert, lay in assuming that certain conditions had to be assumed in order to make reliable estimates using data like Fryer’s. In fact, these scholars wrote, a weaker statistical condition—what they term “subset ignorability”—would also suffice, and it was more likely to hold, “exactly or approximately,” in practice. They then attempt to show how the standard causal estimation techniques can be saved by putting forth their own analysis of racial bias in prosecutors’ decisions to pursue charges (again relying on the sort of data from police records that the APSR authors find problematic).
In the days following this exchange, what ensued can only be described as a high-profile statistical showdown on Twitter, punctuated by takes from interested onlookers. The second camp mounted a defense of the mathematics, arguing that progress in statistical methods should not be foreclosed for fear of unobservable bias. In a policy environment that increasingly looks to quantitative analyses for guidance, Goel wrote, “categorically barring a methodology . . . can have serious consequences on the road to reform.” The APSR authors, by contrast, emphasized what they took to be the purpose of applied social scientific research: to provide analysis at the service of real-world policy and practical political projects. Knox, for example, wrote that their critics’ argument “treats racial bias like a game with numbers.” Instead, he went on, he and his co-authors “use statistics to seek the best answers to serious questions—not to construct silly logic puzzles about knife-edge scenarios.” This is no time, the APSR authors argued, to fetishize mathematical assumptions for the sake of cranking the statistical machinery.
• • •
What are we to make of this debate? Despite the references to mathematics and the sparring of proof-counterexample-disproof, which suggest a resolution is to be found only in the realm of pure logic, the dispute ultimately comes down to a banal, congenital feature of statistical practice: the plausibility of the assumptions one must make at the start of every such exercise. For the APSR authors, even the second camp’s weaker assumption of subset ignorability fails the test of empirical scrutiny: to them it is clearly implausible as a matter of how the social world in fact is and works. Ironically though, given their forceful criticism of the APSR paper, the second camp comes to the same conclusion in their own analysis of prosecutors’ charging decisions, conceding that “subset ignorability is likely violated”—thus rendering their own results empirically suspect.
The requirement that social science be truly “evidence-based” is extremely demanding: it means that we cannot justify our use of implausible assumptions solely on the basis of mathematical convenience.
This curious episode demonstrates how the social scientist is so often trapped in a double bind in her quest to cleave to her empirical commitments, especially when it comes to the observational studies—as opposed to randomized experiments—that are the bread and butter of almost all quantitative social science today. Either she buys herself the ability to work with troves of data, at the cost of implausibility in her models and assumptions, or she starts with assumptions that are empirically plausible but is left with little data to do inference on. By and large, quantitative social science in the last two decades has taken the former route, thanks in significant part to pressure from funding incentives. If implausible assumptions are the price of entry, the Big Data revolution promises the payment is worth it—be it in profit or professional prestige. As the mathematical statistician David A. Freedman wrote, “In the social and behavioral sciences, far-reaching claims are often made for the superiority of advanced quantitative methods—by those who manage to ignore the far-reaching assumptions behind the models.”
But if the social scientist is genuinely committed to being empirical, this choice she must make between plausible assumptions and readily available data must itself be justified on the basis of empirical evidence. The move she winds up making thus tacitly reveals the credence she has toward the theories of the social world presently available to her, or at least the kind of commitments she is willing to be wrong about. Precisely to the extent that social science is something more than mathematics—in the business of figuring out how the world is, or approximately is—statistical assumptions can never shake off their substantive implications. The requirement that social science be truly “evidence-based” is thus extremely demanding: it means that we cannot justify our use of implausible assumptions solely on the basis of mathematical convenience, or out of sheer desire to crank the statistical machinery. It is only in the belief that our assumptions are true, or true enough, of the actually existing world that social science can meet this exacting demand.
Notice the role that normativity plays in this analysis. If, as the first step to embarking on any statistical analysis, the quantitative social scientist must adopt a set of assumptions about how the social world works, she introduces substantive theoretical commitments as inputs into her inquiry. This initial dose of normativity thus runs through the entire analysis: there is simply no escaping it. Whether any subsequent statistical move is apt will depend, in however complex ways, on one’s initial substantive views about the social world.
Far from playing no role in quantitative social science, a shared set of theoretical and normative commitments is what allows data-first methods to work at all.
What do these reflections mean in the specific case of research on race and policing? Whether one has in fact distilled the causal effect of race on police behavior in any particular study will depend on what one believes to be true about the racial features of policing more broadly. And since what positions you take on these matters depend on your background views regarding the prevalence and severity of racial injustice as an empirical phenomenon, whether a finding ends up passing statistical muster and therefore counts as an instance of racially discriminatory police action will depend on your broader orientation to the social world.
The upshot of these considerations is that statistical analysis is inescapably norm-laden; “following the data” is never a mechanical or purely mathematical exercise. But this fact should not lead us to discard any commitment to empirical validity as such. On the contrary, it should simply serve to remind us that standards of empirical scrutiny apply throughout the whole of any methodology. As Freedman put it, “The goal of empirical research is—or should be—to increase our understanding of the phenomena, rather than displaying our mastery of technique.”
• • •
One important consequence of this orientation, I think, is that we ought to subject not just assumptions but also conclusions to empirical scrutiny. To some observers of our social world, the conclusion that there is no causal effect of race in police shootings is not only implausible: it is simply and patently false. For even a cursory glance at descriptive summary statistics reveals wide gulfs in the risk of being killed by police for Blacks compared to whites. According to one study, Black men are about 2.5 times more likely to be killed by police than white men, and data from the 100 largest city police departments show that police officers killed unarmed Black persons at four times the rate of unarmed white persons—statistical facts that speak nothing of the immense historical record of overtly racist policing, which does not lend itself so easily to quantification. If certain methods erase these stark (and undisputed) disparities, painting a picture of a social landscape in which race does not causally influence police shooting behaviors, then so much worse for those methods. From this vantage, failing to take account of the many different forms of evidence of decades of racialized policing and policymaking is not only normatively wrong. It is also empirically absurd, especially as a self-styled “evidence-based” program that seeks to illuminate the truths of our social world.
Rejecting a study’s methods or its starting assumptions on the basis of disagreement with its results is a completely legitimate inferential move.
This suggestion—that we sometimes ought to reject a finding on the grounds that it does not accord with our prior beliefs—might seem downright heretical to the project of empirical science. And indeed, there is some danger here; at the extreme, indiscriminate refusal to change our minds in the light of evidence reeks of a sham commitment to empirical study of the world. But the truth is that scientists reject findings on these sorts of grounds all the time in the course of utterly routine scientific practice. (For just one recent newsworthy example, consider a 2011 study that found evidence for extrasensory perception.) The move need not signal a failure of rationality; indeed it can often be a demand of it. Determining which it is, in any particular case, cannot be settled by asking whether one has been faithful to “facts and logic,” as so many like to say, or to the pure rigors of mathematical deduction.
Instead, when a scientific finding conflicts with one of our convictions, each of us must comb over what philosopher W. V. O. Quine so charmingly called our “web of belief,” considering what must be sacrificed so that other beliefs might be saved. And since our webs are not all identical, what rational belief revision demands of us will also vary. One man’s happily drawn conclusion (p, therefore q!) is another’s proof by contradiction (surely not q, therefore not p!). Or as the saying goes, one man’s modus ponens is another man’s modus tollens. Rejecting a study’s methods or its starting assumptions on the basis of disagreement with its results is a completely legitimate inferential move. We tend to overlook this feature of science only because for most of us, so much of the nitty-gritties of scientific inquiry have little direct bearing on our daily lives. Our webs of belief usually are not touched by the latest developments in science. But once scientific findings make contact with—and perhaps even run up against—our convictions, we become much more sensitive to the way the chain of reasoning is made to run.
The fact that good faith efforts at rationality might lead different people to different or even opposite conclusions is a basic, if unsettling, limitation of science. We cannot hope for pure knowledge of the world, deductively chased from data to conclusion without mediating theory. In the end, the Fryer study controversy has been one long object lesson in how our empirical commitments are invariably entangled with normative ones, including commitments more typically thought of as ethical or political. The choice to sacrifice empirical plausibility in one’s assumptions, in particular, is not just a “scientific” matter, in the oversimplified sense of “just the facts”: it is inevitably interwoven with our ethical and political commitments. In bringing one’s web of beliefs to bear on the debate over what constitutes proper study of effects of race in policing, one puts forth not just prior empirical beliefs about, say, the prevalence of racial targeting or the fidelity of police reporting practices, but also one’s orientation toward matters of racial justice and self-conceptualization as a researcher of race and the broader system of policing and criminal justice.
The fact that good faith efforts at rationality might lead different people to different or even opposite conclusions is a basic, if unsettling, limitation of science.
For the APSR authors, bias in policing presents both enough of a normative concern and an empirical possibility to license, as a matter of good scientific practice, the sacrifice of certain business-as-usual approaches. The second camp, by contrast, is loath to make the leap to discard approaches held in such high esteem. Their commitment to the usefulness of the standard approaches runs so deep that they do not yet see sufficient cause for retreat. In a revision of their paper released in October, the authors remove the explicit assertion of a “mathematical error” but find “reason to be optimistic” that many cases of potential discrimination do meet the empirical conditions prescribed by the statistical assumptions proposed to salvage the usual approaches.
What exactly these reasons for optimism are remains unclear. By the second camp’s own admission, because “one cannot know the exact nature and impact of unmeasured confounding . . . we must rely in large part on domain expertise and intuition to form reasonable conclusions.” And yet without reference to any such further evidence or support, they nevertheless conclude: “In this case, we interpret our results as providing moderately robust evidence that perceived gender and race have limited effects on prosecutorial charging decisions in the jurisdiction we consider.” Such a claim ultimately says much more about their web of belief than about the actually existing social world.
• • •
For those whose beliefs, empirical and ethical, are forged in participation in radical sociopolitical movements from below, to be ill-inclined to accept certain findings about race and policing is to remain steadfast in a commitment to a certain thick set of empirical and ethical propositions in their webs of beliefs: that systems of policing and prisons are instruments of racial terror and that any theory of causation, theory of race, and statistical methods worth their salt will see race to be a significant causal factor affecting disparate policing and prison outcomes. This just is the first test of “fitting the data.” It is not a flight from rationality but an exercise of it.
Statistical analysis is inescapably norm-laden. To acknowledge as much is not to give up on quantitative social science as a venture for better understanding the world.
Does this view of social science transform an epistemic enterprise into a crudely political one? Does a readiness to sacrifice some scientific findings to save ethical or political commitments endanger the status of science as a distinctive project that seeks to produce new knowledge about the world? I think it doesn’t have to. Even the hardest-nosed empiricist starts from somewhere. She must interpret her data against some background theory that she takes to be the most natural, most plausible, and most fruitful. Deviations from this position that are self-consciously animated by politics need not be less genuinely truth-seeking than self-styled neutral deference to the status quo.
This fact tends to get lost in debates about where science sits along a continuum that runs from “objective” (protected from bias and outside interference) to “political” (a no-holds-barred struggle for power, the label of “science” slapped onto whatever the winner wishes). What that picture elides is how science unfolds in the trenches of knowledge production: in the methodological minutiae that determine which assumptions must be sacrificed and which can be saved, when abstraction leads to silly logic puzzles and when it is a necessary evil, which conclusions trigger double-takes and which signal paradigm shifts, and so on. To acknowledge that these struggles cannot take us beyond the never-ending tides of the “latest findings” is not to give up on quantitative social science as a venture for better understanding the world. It is simply to embrace a conception of social inquiry that is always, as philosopher Richard J. Bernstein put it, at once “empirical, interpretative, and critical.”