Justice in the Age of Big Data

3 min readMar 26, 2021

In chapter 5 of the book Weapons of Math Destruction, the author discusses the widespread bias in some of the big data analytics systems used by the justice department. The spotlight was shed on a California based crime analytics system called PredPol. PredPol was summoned in the face of an increasing crime rate and understaffing of the Reading police department. Optimizing patrol times to accommodate the staff shortage of the police department was a crying need to ensure safety. PredPol, a California based big data startup could help the dwindling police department to predict the exact shifts necessary at specific city locations using the city’s historical crime data and that too at a scale beyond human capabilities. However, the training data used to train the predictive algorithm of PredPol eventually reflected social stereotypes in that the data was biased against a racial minority of the impoverished neighborhoods.

PredPol was originally intended to be used for preventing serious crimes in the crime pyramid. However, data on serious crimes was scarce and reflected poorly on the system’s performance. So, more data on petty and nuisance crimes were being fed into the system to train it to identify more potential criminal incidents. Now, as the saying goes, a model is as good as its data. Since, PredPol was being trained on a loop of stereotypical data from the impoverished neighborhoods, young men of color were being continuously targeted by the algorithm and incarcerated by the law enforcement forces. Data about such so called successful expeditions fuel the model to further justify the social status quo that only individuals who fit the young minority profile are more likely to commit a crime. As a result, the PredPol WMD creates a feedback loop. Such a loop is also existent in the judicial system where sentencing is determined by the predictive analytics of recidivism. Instead of analyzing whether a felon’s lifestyle in imprisonment is contributing to the odds of that individual to commit more crimes after emancipation, the system analyzes the situation in the opposite direction. The odds of an individual to commit further crimes shape the felon’s time in incarceration.

Whether or not such predictive analytics is fair is debatable. Just like any other abstract concept exclusive to human psychology, fairness is difficult to model in terms of mathematics. As an alternative, the designers of these models utilize proxy data to model such pseudo-psyche. Similar to how the designers of PredPol argues a lack of bias in the algorithm by the merits of how it uses only geographical data, a lack of unfairness in such algorithms can be argued for by the merits of proxy data from the feedback loop created by these algorithms. But, a feasible solution to this feedback loop comes from the rethinking the data used to train these models. A model will learn only what it will be taught by the data. If the data is biased the model’s prediction will be biased. If WMDs like PredPol are instead trained on an equiprobable distribution of data on serious crimes, nuisance crimes and white collar crimes then all communities ranging from the impoverished to the wealthy, will have equal footing in the WMD backed judicial system. The apparent goals of the present WMD models are to only justify what people already know as ground truth. But, if such WMDs are designed and trained beyond the confirmation bias of the human civilization, that is when such WMDs will achieve true intelligence and will be able to break free from its feedback loop.

Justice in the Age of Big Data

Written by Ishrak