Man vs Machine Learning: Criminal Justice in the 21st Century | Jens Ludwig | TEDxPennsylvaniaAvenue
The speaker, a University of Chicago professor, argues that policymakers should not debate *if* they should adopt machine learning for public policy but rather *how* to do so. The process involves using sophisticated data analysis, initially modeled on commercial tools like Netflix, to improve high-stakes human judgments, such as pre-trial release decisions. The strongest evidence presented is that an algorithm, when correctly designed to account for fairness, can potentially reduce the crime rate by 25% and the jail population by 42% without increasing crime.
## Speakers & Context
- University of Chicago professor, who studies crime and the criminal justice system.
- Friend of the speaker, who is a professor at an Ivy League medical school.
- Initial context is a road trip from New York City to New England, providing a casual setting for the discussion.
## Theses & Positions
- The primary argument is that the conversation regarding machine learning in policy should be about *"how"* to implement the technology, not *"whether"* to adopt it.
- The decision to jail or release someone is an "enormously high-stakes decision" that currently relies on human judgment (the judge's prediction).
- Humans are susceptible to cognitive biases, exemplified by the "introspection illusion," where reliance on easily visible, but irrelevant, details can mislead judgment.
- Simply building an algorithm is insufficient; the most difficult challenge is testing it in the real world, particularly absent randomized trials.
- The goal of ML in policy is to supplement human judgment by flagging predictive patterns in data, which is superior to intuition alone.
- An ML algorithm can simultaneously achieve multiple positive outcomes—reducing crime, reducing jail populations, and reducing racial disparities—if built with fairness in mind.
## Concepts & Definitions
- **Sentiment analysis:** The process of taking a snippet of text and attempting to determine the author's underlying affect—whether they were conveying a positive or negative emotion.
- **Flight risk:** A factor judges consider when determining if a defendant should be released before trial.
- **Public safety risk:** A factor judges consider when determining if a defendant should be released before trial.
- **Introspection illusion:** The difficulty for humans to fully analyze and understand their own cognitive processes while performing certain tasks.
- **Algorithmic rule:** A proposed, data-driven protocol intended to inform pre-trial release by prioritizing individuals with the highest predicted risk for jailing.
## Mechanisms & Processes
- **ER Protocol (Example):** The standard approach to chest pain involves administering a cardiac enzyme test to predict a heart attack; a positive result defaults to ICU admission.
- **ER Decision Flow (Intervention):** Friend's intervention involved manually assessing the patient in the waiting room, delaying the ICU transfer, leading to the patient going into cardiac arrest.
- **Jail Decision Process:** A judge reviews manila files containing arrest charge and prior record information to decide release or jail time.
- **Sentiment Analysis Programmatic Approach:** A two-stage process: 1) Using a known sample of movie reviews (good/bad outcome) to let a computer learn which words correlate with positive/negative sentiment. 2) Using those learned word patterns as a prediction algorithm for unseen reviews.
- **Testing Method 1 (Missing Data Insight):** If the algorithm recommends jailing someone the judge released, the effect of being jailed is known (it eliminates the risk of non-appearance).
- **Testing Method 2 (Comparative Performance):** Comparing the algorithm's performance across different strictness levels (e.g., comparing a judge who releases 90% vs. 80% against the algorithm simulating that same shift).
- **Algorithm Implementation:** The rule is designed to identify the highest risk X% of people in the judge's caseload for mandatory jailing.
## Timeline & Sequence
- **Roadtrip:** Start at New York City, journeying toward New England.
- **ER Incident:** Patient arrives with chest pain $\rightarrow$ Enzyme test positive $\rightarrow$ Default to ICU $\rightarrow$ Friend intervenes $\rightarrow$ Cardiac arrest occurs $\sim 30$ minutes later.
- **ML History Progression:** Early programmers used word lists (60% accuracy) $\rightarrow$ Computer scientists realized introspection was flawed $\rightarrow$ Shift to data-driven pattern recognition (95% accuracy).
- **Research Phase:** Applying ML to pre-trial release using data from a large American city (8.5 million people) $\rightarrow$ Developing a testable, non-randomized rule.
## Named Entities
- University of Chicago — Speaker's affiliation.
- Ivy League medical school — Friend's workplace.
- New York City — Starting point of the discussion narrative.
- New England — Destination of the discussion narrative.
- ER (Emergency Room) — Setting for the first illustrative case.
- Operating room — Location where the patient eventually goes.
- American city — Source of the data used for the pre-trial algorithm.
## Tools, Tech & Products
- Cardiac enzyme test — Tool used in standard ER protocol to predict heart attack risk.
- Netflix — Commercial technology used by judges to predict movie preferences.
- Hutzler 571 Banana Slicer — Consumer product used in the sentiment analysis demonstration.
- Software — Free software available online for building the predictive algorithm.
## Concepts & Definitions
- **Sentiment analysis:** Determining the author's positive or negative affect from text snippets.
- **Flight risk:** A key variable in the judge’s risk assessment.
- **Public safety risk:** A key variable in the judge’s risk assessment.
- **Introspection illusion:** The human inability to fully analyze one's own cognitive processes during performance.
- **Algorithmic rule:** The proposed mechanism prioritizing the highest risk X% of defendants for pre-trial detention.
## Numbers & Data
- Average jail stay: **two to three months or longer**.
- Baseline accuracy rate for random guessing in binary classification: **50%**.
- Accuracy rate of early programmatic sentiment analysis: **60%**.
- Accuracy rate of data-driven movie review analysis: **95%**.
- Population size of the modeled American city: **8.5 million people**.
- Potential crime rate reduction following algorithm use: **fully 25%**.
- Potential jail population reduction following algorithm use: **fully 42%**.
- Percentage of people in the studied city jail who are minorities: **89%**.
## Examples & Cases
- **ER Scenario:** Patient presents with chest pain; test is positive $\rightarrow$ Default action is ICU transfer; friend overrides this, delaying transfer until the patient suffers cardiac arrest.
- **ML Demonstration (Thrifty E):** Review text indicates negative affect; confirmed by 2/5 stars.
- **ML Demonstration (Uncle Pookie):** Review text indicates positive affect; confirmed by 5-star rating.
- **ML Demonstration (Q-Tip/J. Anderson):** Reviews highlight ambiguities or flaws in the physical product design.
- **Hypothetical Comparison:** Comparing the outcomes of a judge shifting from 90% release rate to 80% release rate against the algorithm simulating the same shift.
## Trade-offs & Alternatives
- **ER Protocols:** Standard clinical testing vs. human observation/manual assessment.
- **Jail Protocol:** Placing someone in jail (removing freedom) vs. releasing them (risk of committing a new crime).
- **ML Framework:** Ignoring racial disparities vs. building the algorithm to pay attention to racial disparities (which yields simultaneous gains).
- **Technological Adoption Debate:** Focusing on the *technology itself* vs. focusing on *how* the technology should be applied to governance.
## Counterarguments & Caveats
- The standard ER protocol of immediate ICU transfer overlooks critical behavioral cues.
- The potential consequence of releasing a person who later commits a crime is severe.
- The ability to build an ML tool must not compromise fairness, particularly regarding racial disparities.
- Directly applying commercial ML tools without rigorous, controlled testing is dangerous and risks exacerbating existing societal problems.
## Methodology
- **ER Observation:** Used to demonstrate human susceptibility to irrelevant, highly salient data points.
- **Computational Linguistics:** Applying data-driven methods to identify underlying patterns in text that signify emotional affect.
- **Algorithmic Development:** Creating a simulation process comparing historical judge outcomes against the algorithm's predicted outcomes on comparable cases.
- **Policy Simulation:** Running projections showing quantifiable improvements in crime and jail metrics based on algorithmic adherence.
## Conclusions & Recommendations
- Policymakers must adopt sophisticated machine learning to address complex public policy problems.
- The right policy debate is centered on *implementation methods* ("how"), not the existence of the technology itself ("whether").
- The core goal is to build decision aids that remove human bias (like the introspection illusion) from high-stakes judgments.
## Implications & Consequences
- Without algorithmic aid, the system remains vulnerable to human distractions and biases (e.g., treating all high-risk cases identically).
- A fair ML system promises a simultaneous reduction in crime, jail populations, and racial disparity.
- The failure to consider bias in ML models can actively worsen systemic inequality.
## Open Questions
- The primary open question remains how to rigorously test ML algorithms for public policy when true randomized controlled trials are ethically or logistically impossible.
## Verbatim Moments
- *"I have an idea for how to make the world a better place, and like all truly good ideas, this one starts with a roadtrip."*
- *"I'm the one who asked the question. Why don't you go first?"*
- *"All they've seen are the data in the chart and the test level above the threshold."*
- *"We've got to go; let's get this guy up to the ICU."*
- *"This is an enormously high-stakes decision."*
- *"the judge gets access to Netflix, which uses some of the most sophisticated machine learning technology on the planet, to help predict what movie the judge is going to like."*
- *"My psychology friends call that the 'introspection illusion.'"*
- *"Progress in this area really only came once the computer scientists realized that we needed to just completely forget that we knew how to do these things ourselves and turned these tasks into just brute force data exercises."*
- *"if you follow the recommendations of the algorithm, you'd be able to reduce the crime rate by fully 25% without having to put a single additional person in jail."*
- *"I think that that actually is the wrong way to frame the debate and frame the question."*