Why big-data analysis of police activity is inherently biased

Source: William Isaac, Andi Dixon, The Conversation, May 9, 2017

….At its core, any predictive model or algorithm is a combination of data and a statistical process that seeks to identify patterns in the numbers. This can include looking at police data in hopes of learning about crime trends or recidivism. But a useful outcome depends not only on good mathematical analysis: It also needs good data. That’s where predictive policing often falls short.

Machine-learning algorithms learn to make predictions by analyzing patterns in an initial training data set and then look for similar patterns in new data as they come in. If they learn the wrong signals from the data, the subsequent analysis will be lacking…..

Source: Human Rights Data Analysis Group, 2017

The growing debate about policing in America arises from concern about horrific but extraordinary acts of police violence. These incidents and the clear racial disparities in criminal justice contact raise important questions about the ordinary practice of policing. Should police stop suspicious individuals and frisk them for weapons? Should departments use statistical techniques to predict crime and make decisions about where to deploy officers? Evaluating police practices requires measuring their benefits and their costs. Do police practices reduce crime? How do they affect communities? How do those effects vary within and among communities? However, community groups and municipal leaders outside law enforcement currently lack data and tools to measure the impact of policing strategies. Community stakeholders – including city governments, community groups, and non-governmental organizations – need rigorous tools to independently evaluate the costs and benefits of various policing strategies.

To assess the benefits and costs of policing, we need to know how police actions affect patterns of crime. Both components – police actions and crime – are hard to measure. Most crime is secret and police practices influence variation in recording of crimes. When departments hire more officers, or when they deploy more officers to certain neighborhoods, recorded crime may increase even if actual crime does not change. Additionally, police knowledge about crime is the result of reporting by civilians who trust the police. Many victims are reluctant to report crime to police because they think the police will be unable to help them, because they worry that police may suspect them of being criminals themselves, or because they fear retaliation from perpetrators or neighbors.

Our team specializes in collecting and analyzing data on events that are hard to measure. In the last year, the Human Rights Data Analysis Group has begun studying issues in U.S. police practice, focusing on three topics: homicides by police, predictive policing, cost-benefit analysis of policing. We have already created multiple new analyses of available data on crime and policing, assessing the accuracy of the number of killings by police and the effects of Predictive Policing. We propose to build on our work to create a scalable, sustainable, community-driven, technically rigorous assessment of the costs and benefits of various policing strategies….