Man vs Machine Learning: Criminal Justice in the 21st Century | Jens Ludwig | TEDxPennsylvaniaAvenue
## People * University of Chicago professor + speaker + studies crime and the criminal justice system. * Friend of the speaker + professor at an Ivy League medical school. * Patient + complaining of chest pain + seen in the ER. * Rest of the team + doctors and nurses on duty in the ER + haven't seen the patient. * Judges + making decisions about when someone is arrested/awaits trial. * Consumer product reviewer (Thrifty E) + wrote a review for the Hutzler 571 Banana Slicer. * Consumer product reviewer (Uncle Pookie) + wrote a review for the Hutzler 571 Banana Slicer. * Consumer product reviewer (Q-Tip) + wrote a review for the Hutzler 571 Banana Slicer. * Consumer product reviewer (J. Anderson) + wrote a review for the Hutzler 571 Banana Slicer. ## Organizations * University of Chicago + speaker's affiliation + research center at The Crime Lab. * Ivy League medical school + friend's workplace. ## Places * New York City + starting point of the road trip. * New England + destination of the road trip. * ER (Emergency Room) + setting for the patient encounter. * Operating room + location where the patient eventually goes. * American city + location of the data used for the research team's algorithm. ## Tools, Tech & Products * Cardiac enzyme test + function to see what's in the blood to predict a heart attack. * Netflix + used by judges for predicting movies. * Hutzler 571 Banana Slicer + consumer product used in the example. * Software + free software available online to download for algorithm building. ## Concepts & Definitions * Sentiment analysis + taking a snippet of text and trying to determine what the author's affect was: if the author was trying to convey a positive or negative emotion. * Flight risk + factor judges consider when deciding if someone should be released. * Public safety risk + factor judges consider when deciding if someone should be released. * Introspection illusion + the difficulty for humans to fully introspect and figure out what they are doing when performing certain tasks. * Algorithmic rule + proposed rule to inform pre-trial release that prioritizes people with the highest predicted risk for jailing. ## Numbers & Data * Two to three months or longer + average time a person sits in jail if the judge puts them in jail. * 50% + accuracy rate for random guessing in a binary classification task. * 60% + accuracy rate achieved by early programmers in sentiment analysis. * 95% + accuracy rate achieved using a data-driven approach for movie review analysis. * 8.5 million people + population size of the American city used in the research. * 25% + potential reduction in the crime rate if the algorithm's recommendations are followed. * 42% + potential reduction in the jail population if the algorithm's recommendations are followed. * 89% + percentage of people in jail in the city where the research is conducted who are minorities. ## Claims & Theses * The speaker's idea for making the world better starts with a road trip. * The default protocol for chest pain is to do a cardiac enzyme test. * The decision of whether to jail or release someone hinges on the judge's prediction of what the defendant would do if released. * The decision to jail or release someone is enormously high-stakes. * Judges go home and use Netflix for movie recommendations, which uses sophisticated machine learning. * It is very, very easy for humans to perform sentiment analysis on text reviews. * It is much more difficult for humans to fully introspect and figure out what they are doing when performing these tasks. * Progress in this area only came once computer scientists realized they needed to forget how they knew how to do these tasks themselves and turn them into brute force data exercises. * The words the machine learns are indicative of positive and negative reviews. * The hardest part about applying this to public policy is testing the algorithm in the real world. * The missing data problem is that when the algorithm wants to release someone the judge jailed, we can't see what that person would have done. * The missing data challenge is one-sided because if the algorithm wants to jail someone the judge released, we know the effect of being in jail. * The process of comparing the algorithm's performance against a judge's on comparable cases is possible due to random assignment of cases to judges. * Following the algorithm's recommendations can reduce the crime rate by fully 25% without adding a single person to jail. * Following the algorithm's recommendations can reduce the jail population by fully 42% without any increase in the crime rate. * The judges are getting distracted by irrelevant but very salient information about the cases, especially among the highest-risk cases. * If an algorithm is built in a release rule that ignores race, it is possible to build a tool that makes the problem worse. * Building an algorithm paying attention to race can simultaneously reduce crime, reduce jail populations, and reduce racial disparities. * The right conversation to have about machine learning for policy applications is not whether to adopt these technologies but how. ## Mechanisms & Processes * (ER Protocol) Administering a cardiac enzyme test to predict a heart attack. * (ER Decision Flow) Test level above threshold -> Default action is to take the patient to the ICU -> Speaker's friend intervenes -> Manual assessment -> Delaying ICU transfer -> Patient goes into cardiac arrest. * (Jail Decision) Judge reviews manila files (information on arrest charge and prior record) -> Judge makes a decision about release or jail. * (Sentiment Analysis Programmatic Approach) 1. Take a large sample of movie reviews where outcome is known (good/bad star rating). 2. Let the computer learn which words tend to come up in good reviews and which tend to come up in bad reviews. 3. Use these learned words as a prediction algorithm for future reviews. * (Testing Method - Insight 1) If the algorithm wants to jail someone the judge released, we know the effect of being in jail (eliminates risk of not showing up/re-arrest). * (Testing Method - Insight 2) Comparing the algorithm's performance when becoming stricter (e.g., from 90% to 80% release rate) against how a specific judge performs in that transition. * (Algorithm Implementation) The algorithm identifies the highest risk X% of people in the judge's caseload to prioritize for jail. ## Timeline & Events * (Narrative Order) Roadtrip trip from New York City to New England. * (Narrative Order) Patient presents to the ER complaining of chest pain. * (Narrative Order) Patient is seen in the waiting room (snacking on watermelon). * (Narrative Order) After the friend advises against immediate ICU transfer, the patient goes into cardiac arrest (half an hour later). * (Narrative Order) The speaker discusses the problem of the jail system in the United States. * (Narrative Order) Early computer scientists noticed that manually analyzing text reviews (sentiment analysis) was possible for humans. * (Narrative Order) Programmers attempted to automate sentiment analysis based on word lists, achieving 60% accuracy. * (Narrative Order) Computer scientists moved to a data-driven approach using known-outcome reviews, achieving 95% accuracy. * (Narrative Order) Testing the algorithm in the real world (pre-trial release) is complicated, especially without a randomized trial. * (Narrative Order) The research team builds an algorithmic rule and tests it using data from a large American city. * (Narrative Order) Policy simulations are run showing potential outcomes if the algorithm is followed. ## Examples & Cases * Friend's ER experience + Patient with chest pain + Test positive -> Default ICU admission -> Friend advocates for waiting room assessment -> Patient goes into cardiac arrest. * Judge system problem + Decision on whether an arrested person goes home or sits in jail. * Netflix movie prediction + Using sophisticated machine learning to predict movie preference. * Review by Thrifty E + Text: "I bought this in order to speed up cutting up a banana for my cereal. Any time I saved in that endeavor was spent cleaning this implement." (Negative review, 2/5 stars). * Review by Uncle Pookie + Text: "Great gift. Once I figured out I had to peel the banana before using it, it works much better." (Positive review, 5-star). * Review by Q-Tip + Text: "Confusing. There's no way to tell if this is a standard or metric banana slicer. Additional markings on it would help greatly." * Review by J. Anderson + Text: "Angle is wrong. I tried this banana slicer and found it unacceptable. As shown in the picture, the slice is curved from left to right and all of my bananas are bent the other way." * Research team data set + Large, anonymous American city of 8.5 million people. * Hypothetical comparison + Comparing outcomes when a judge moves from 90% to 80% release rate versus the algorithm doing the same. ## Trade-offs & Alternatives * (ER) Standard protocol (cardiac enzyme test) vs. Friend's action (meeting patient in person first). * (Jail) Releasing a person who commits a new crime (horrible way) vs. Putting a person in jail (average of two to three months or longer). * (Policy Application) Taking tools from the commercial sector vs. adapting them to public policy problems. * (Algorithm Choice) Building a rule that ignores racial disparities vs. building a rule paying attention to racial disparities (which leads to simultaneous gains). ## Counterarguments & Caveats * The standard protocol for chest pain is to do a cardiac enzyme test, but the friend's human intervention changes the course. * If the judge puts you in jail, you will on average sit there for two to three months or longer, sometimes much, much longer. * The flip side of releasing someone who commits a new crime could be horrible in its own way. * It is difficult to solve the testing problem in the real world for public policy applications absent the ability to do a randomized trial. * Many people are tempted to give up on the testing stage and take tools right from the drawing board of the computer into the real world. * It is very possible to inadvertently build a tool that can wind up making the world a worse place, not a better place. * The debate about bringing commercial ML tools to public policy is potentially the wrong way to frame the question. ## Methodology * (Observation) Observing the ER process to highlight susceptibility to salient but irrelevant data (cardiac enzyme test result). * (Academic Research) Running a research team on the problem of the jail system. * (Machine Learning Development) Using a programmed approach for sentiment analysis on movie reviews (analyzing word patterns). * (Algorithmic Testing) Building an evaluation process by comparing the algorithm's predicted outcome to a judge's historical outcome on comparable cases. * (Policy Simulation) Running simulations to suggest outcomes based on algorithmic recommendations. ## References Cited * (None stated) ## Conclusions & Recommendations * Policymakers should use sophisticated machine learning technology to solve important public policy problems. * The right conversation about ML for policy applications over the next ten years is "not whether to adopt these new technologies but how." ## Implications & Consequences * If the judge puts you in jail, you will on average sit there for two to three months or longer, sometimes much, much longer. * If the judge releases someone who goes on to commit a new crime, that could be horrible in its own way. * If the algorithm is used without considering fairness, it could worsen racial disparities in the criminal justice system. * If the algorithm is built considering fairness, it can simultaneously reduce crime, reduce jail populations, and reduce racial disparities. * If the technology is ignored, the world continues to operate under existing, potentially fallible, human judgment systems. ## Open Questions * What is the best way to address the challenge of testing ML algorithms in real-world public policy settings without randomized trials? ## Verbatim Moments * "I have an idea for how to make the world a better place, and like all truly good ideas, this one starts with a roadtrip." * "I'm the one who asked the question. Why don't you go first?" * "All they've seen are the data in the chart and the test level above the threshold." * "We've got to go; let's get this guy up to the ICU." * "This is an enormously high-stakes decision." * "the judge gets access to Netflix, which uses some of the most sophisticated machine learning technology on the planet, to help predict what movie the judge is going to like." * "My psychology friends call that the 'introspection illusion.'" * "Progress in this area really only came once the computer scientists realized that we needed to just completely forget that we knew how to do these things ourselves and turned these tasks into just brute force data exercises." * "This gives us a way to fairly compare the algorithm's performance against the judge's on a comparable set of cases, focusing on the algorithmic task where we don't have this missing data problem, where the algorithm is just selecting people to jail from among the pool of people that the judges let go." * "if you follow the recommendations of the algorithm, you'd be able to reduce the crime rate by fully 25% without having to put a single additional person in jail." * "I think that that actually is the wrong way to frame the debate and frame the question."