In this edition of Legal Currents and Futures, The Colleges of Law continues sharing a series of thought pieces about artificial intelligence and the law.
The use by the courts of risk assessment programs is one of the principal incursions of artificial intelligence (AI) into criminal law judicial determinations. There are other AI programs that are used in law enforcement work and in preparation and presentation of evidence at trial, including probabilistic genotyping,[1] predictive policing (profiling),[2] facial recognition,[3] and other machine-based processes.[4] These other AI programs affect judicial outcomes and influence judges and jurors, fairly or unfairly, in the outcome of cases. When this evidence is proffered, the proponents and opponents can be heard. However, criminal risk assessment programs are often directly applied in the making of judicial determinations without objection or the opportunity to meaningfully object.
While there are proponents of risk assessment programs and they are now a part of the judicial landscape, this article will take a critical look at their use and validity. AI programs used to make pretrial risk assessments were questioned by the present author in this magazine five years ago.[5] In this author’s opinion, the same criticisms still apply to pretrial assessments and also apply to the use of risk assessments in post-trial sentencing. Over the last five years, studies and meta-studies have been published questioning the predictive validity of these assessments.[6] This article will examine the foundational validity and validity as applied of risk assessments in determining whether people will be confined in jails or prisons both pretrial and post.[7]
AI Criminal Risk Assessment
Criminal risk assessment occurs in a criminal case, whether with the use of AI or solely by a judicial officer, at two main junctures.[8] The first is to determine whether a person will remain in jail while awaiting trial. The second is to determine whether a person should be remanded to custody after sentencing. Pre-trial, the judicial decision to release or detain a person is based on a determination of the likelihood the person will appear in court and on the likelihood that the person will pose a danger to public safety.[9] After trial or plea, the judicial decision to grant probation or to impose a period of custody (including the length and nature thereof) is based in part on public safety.[10]
Pretrial release is now guided by the California Supreme Court decisions in Humphrey and White,[11] applying the state constitutional provisions in Article 1, sections 12 and 28 of the California Constitution and the Eighth Amendment to the United States Constitution. A person must be released on bail or given the opportunity to be released except in capital cases,[12] cases of violence or sexual offense where there is a substantial likelihood that release would result in great bodily harm,[13] or in felony offenses where there is a substantial likelihood that a threat of great bodily harm would be carried out.[14] However, the declaration of victims’ rights states that the “safety of the victim and the victim’s family [be] considered in fixing the amount of bail and release conditions for the defendant.”[15] Thus, whether a person is released pretrial or is confined to a jail cell while waiting to go to trial is based in part on a judicial determination that the person will be a risk to public safety.[16]
Post-conviction, the sentencing decision is made by the judge. Traditionally, judges formed their own opinions as to whether a person posed a danger to the public based on the totality of the circumstances and the trial court would not be disturbed on appeal “in the absence of a clear showing that its sentencing decision was arbitrary or irrational.”[17] There are nonbinding guidelines in the California Rules of Court that set forth criteria for deciding whether to grant probation or impose a prison sentence in felony cases, including, “The likelihood that if not imprisoned the defendant will be a danger to others.”[18] Therefore, a Judge’s determination of a risk to public safety can mean the difference between going home on probation or being sent to prison.
In both pretrial detention and sentencing, on a good day, the judge will evaluate the information before the court, including evidence of risk to public safety, and make a decision. That decision, with or without the input of AI, can be based on risk factors[19] but ultimately will be based on the judge’s perceptions and biases. Although the consideration of risk factors may simulate a Bayesian analysis, human beings are not able to compute the interrelation and effect of all variables. Bayes’ analysis is a metaphor at best. As applied in actual science, approximating such an analysis requires banks of high-speed computers and extensive data input. Even the most sophisticated systems of Bayesian networking depend on feedback loops and constant readjustments.[20]
Thus, unaided by a supercomputer, the human brain necessarily reverts to shortcuts which are based on prior observations, generalizations, stereotypes, and biases. Judgments about whether a person is a risk to public safety is, at best, an educated guess. The judge will engage in heuristics to make that decision and even the most informed judge with the best of intentions is engaged in a project of uncertainty.[21]
AI does not eliminate those human factors, it simply conducts the same process through a computer program based on data input (which may be also based on machine learning) and generates its own generalizations, stereotypes and biases. Even with the most sophisticated computer program and carefully crafted data input, at the very best, AI assisted decisions regarding a risk to public safety is a project in uncertainty. While giving the illusion, or perhaps comfort, of putting the decision off on a scientific sounding “tool” or “instrument,” it still is no more than an educated guess that can result in actual people being incarcerated. As the California Supreme Court said, evidence purporting to predict future dangerousness is not reliable and “[n]either psychiatrists nor anyone else have reliably demonstrated an ability to predict future violence or ‘dangerousness.’”[22]
The consequence is not academic. The unpleasant truth is that a person confined to a jail or prison will have all amenities that we take for granted removed, will be treated like a caged animal, and will be stripped of human dignity. The decision can change lives forever—not just of the person incarcerated but their family, loved ones and community.[23] The decisions are arbitrary and there is no guarantee that the decision will be “correct” or even rational. The White decision itself is a cautionary tale.[24] The prosecution urged that Mr. White was a risk to public safety and must be detained. The trial judge made that ominous finding. The defendant filed a Petition for Writ of Habeas Corpus. Before it reached the Supreme Court, the prosecution agreed that Mr. White should plead to accessory after the fact and that he was suitable to be released on probation. The trial court agreed. Evidently Mr. White was not such a danger to the public safety after all.[25]
AI Based Risk Assessment
On one level, it might be argued that AI risk assessment programs might be more objective than a human judge. However, the science does not bear that out. So-called “validated risk assessment tool[s]” and “risk assessment instrument[s]” have been used as terms of art in erstwhile legislation regarding evaluation of conditions for release of arrestees.[26] The clinical sounding terms “tool” or “instrument” makes these algorithms seem scientific. However, it is not what they are called but the validity of their predictivity that should be of concern. The one assessment area with some validation relates to prediction of aggression inside the institution by sexual offenders but there is no convincing validity to regarding general predictivity of risk to public safety.[27]
First, these so-called tools cannot supplant subjective human input. A British study that drew on experiences in the United States and elsewhere, came to the conclusion that, “In relation to all uses of algorithmic decision-making technology, the aim must be to “augment human legal [and other] intelligence, not to replace it” and to ensure that artificial intelligence “aligns with law and the Rule of Law in a testable and contestable way.”” [Footnotes omitted.][28] So, as refined as the algorithm might be, a judge is still going to have to exercise judgment and discretion.
Second, even the most sophisticated risk assessment programs are not demonstrated to be reliable. A proper forensic instrument must be demonstrated to have foundational validity and validity as applied.[29] A risk assessment “tool” or “instrument,” if admitted into evidence before the bail setting magistrate or the sentencing judge, is thereby considered as probative on the issue of whether or not a human being is to be caged in a cell. The standard for admissibility can be no less than the standard for admission of any other forensic evidence subject to a motion in limine.[30] Admissibility cannot be based on speculation and must have a scientific basis.[31]
However, it is difficult to test the foundational validity or validity as applied of risk assessment programs since, in actual practice, recidivism cannot be tested independently of the decision to place in custody. Notwithstanding this variation of the Heisenberg principle, a comprehensive meta-analysis of the scientific literature regarding validation studies of risk assessment tools concluded, “Overall, the predictive performance of the included risk assessment tools was mixed, and ranged from poor to moderate.”[32] The same article noted that results were more favorable when one or more authors were associated with the company producing the tests and many studies had a “high rate of bias.”[33]
Third, human input into the design of the test has a significant effect on outcome. The relevant population selected may or may not have relevance to the particular individual subject to testing. Parameters will include age, ethnicity, gender and other variables. They will also include some geographical criteria, for instance, country, state, city or maybe postal zip code. Clearly the questions soliciting information about or from the subject will have an impact. Moreover, there is an effect based on who enters the answers to the questions. If the administrator enters the data the entries may reflect subjective choices and implicit bias may affect the outcome. If the subject enters the data, bias and manipulation might prejudice the results.
Fourth, the overall assessment of the validity of a test may be predicated on the validity of identifying the more extreme cases. The sort of red flags that any judge would recognize as presenting a risk of offending—for instance, assessment of a person who was a serial rapist with increasing escalation of violence—would not only be a red flag to the judge but will assure that an AI determination based on this data will likely be confirmatory. This does not affect the reliability or validity of the program when assessing the close cases. In other words, the false positives and false negatives are less likely to be found in the extremes even though those extremes may help validate arbitrary cut offs for low or high risk determinations but this proficiency at the extremes does not validate determinations toward the middle of the curve. Even if the extreme cases can be predicted at more than chance, the median cases may be closer to chance.
This is not to say that risk assessment results cannot be considered in extreme cases, along with judicial review of the risk factors themselves. The very high or very low results should correlate to objective criteria that a court can consider. It is not so much the score in those cases as the factual basis for the assessment. On the other hand, scores in cases in the middle should only be considered, if at all, by the judge for the limited input they may provide. The promotional hype associated with the so-called “tools” should not persuade a judge to rely on them for more than it can really contribute to the process of making the ultimate important judicial decision.
Should Artificial Intelligence Be Utilized in Criminal Law?
Risk assessment “tools” or “instruments” were shiny new objects that reflected an aura of modern science. They have been critically assessed and are found to not possess much more than confirmation of the obvious in the extremes and a lack of validity as applied in cases where a determination matters. Comparison is sometimes made to validated medical tests. A decision to perform an invasive surgery is not conclusively based on a test that is close to chance and not even if it is 90% predictive.[34] Judicial judgment, like medical judgment, should not be swayed by AI programs that are not demonstrated to have solid foundational validity and validity as applied in the particular case.
Robert Sanger is a Certified Criminal Law Specialist (Ca. State Bar Bd. of Legal Specialization) and has been practicing as a litigation partner, now senior partner at Sanger, Hanley, Sanger & Avila, in Santa Barbara for 50 years. Mr. Sanger is a Fellow of the American Academy of Forensic Sciences (AAFS). He is an Adjunct Professor of Law and Forensic Science at the Santa Barbara College of Law. Mr. Sanger is an Associate Member of the Council of Forensic Science Educators (COFSE). He is Past President of California Attorneys for Criminal Justice (CACJ), the statewide criminal defense lawyers’ organization.
The opinions expressed here are those of the author and do not necessarily reflect those of the organizations with which he is associated. ©Robert M. Sanger.
About The Colleges of Law
The Colleges of Law is a California-based law school that offers a traditional Juris Doctor program as well as an innovative Hybrid J.D. program for working professionals who want to get their law degree with a more flexible schedule. Our law school is regionally accredited by the Western Association of Schools & Colleges Senior College and University Commission (WSCUC), and the J.D. program is accredited by the Committee of Bar Examiners of the State Bar of California, which qualifies students to take the California Bar Exam and practice law in California. Request more information about how you can start your legal education studies.
[1] See, e.g., William Thompson, Uncertainty in probabilistic genotyping of low template DNA, 68 Journal of Forensic Science 1049 (May 2023).
[2] For an article that encouraged additional thinking in this area, see, Note, Data Mining, Dog Sniffs, and the Fourth Amendment, 128 Harv. L. Rev. 691 (2014).
[3] See, e.g., Denise Almeida, Konstantin Shmarko, and Elizabeth Lomas, The ethics of facial recognition technologies, surveillance, and accountability in an age of artificial intelligence: a comparative analysis of US, EU, and UK regulatory frameworks, 2 AI Ethics 377–387 (2022).
[4] This would include “second generation data evaluation, such as “location tracking; automatic license plate readers, electronic toll collection systems, speed cameras and car GPS devices recording travel patterns; social media websites tracking communications; e-shops facilitating the creation of consumer profiles; and digital databases storing financial data.” Athina Sachoulidou, Going beyond the “common suspects”: to be presumed innocent in the era of algorithms, big data and artificial intelligence, Artificial Intelligence and Law (published on-line Feb. 22, 2023).
[5] Robert Sanger, The Need to Revise the New Bail Law – Part III, 556 Santa Barbara Lawyer Magazine 8 (January 2019).
[6] See, e.g., the following articles and the sources cited therein: Seena Fazel, Matthias Burghart, Thomas Fanshawe, Sharon Danielle Gil, John Monahan, and Rongqin Yu, The predictive performance of criminal risk assessment tools used at sentencing: Systematic review of validation studies, 81 Journal of Criminal Justice 101902 (2022); and Fatima Dakalbab, Manar Abu Talib, Omnia Abu Waraga, Ali Bou Nassif, Sohail Abbas, and Qassim Nasir, Artificial intelligence & crime prediction: A systematic literature review, 6 Social Sciences & Humanities Open 100342 (2022).
[7] See also the discussion in last month’s Criminal Justice column, Robert Sanger, Artificial Intelligence, Criminal Liability and the Trolley Problem, 514 Santa Barbara Lawyer Magazine __ (November 2023).
[8] They can also be used in civil conservatorships, parole, and custodial settings which are not the subject of this paper.
[9] ‘In setting, reducing, or denying bail, a judge or magistrate shall take into consideration the protection of the public . . .” Cal. Penal Code § 1275(a)(1).
[10] “The Legislature finds and declares that the purpose of sentencing is public safety . . .” Cal. Penal Code § 1170.
[11] In re Humphrey, 11 Cal.5th 135 (2022); and In re White, 9 Cal.5th 455 (2020).
[12] Cal. Const. Art. 1, § 12 (a); there is no requirement of a likelihood of danger to the public in a capital case, “when the facts are evident or the presumption great.”
[13] Cal. Const. Art. 1, § 12 (b); this section requires that “the facts are evident or the presumption great” and based on clear and convincing evidence that there is “a substantial likelihood the person’s release would result in great bodily harm to others.”
[14] Cal. Const. Art. 1, § 12 (c); this section requires that, “there is a substantial likelihood that the person would carry out the threat if released.”
[15] Cal. Const. Art. 1, § 28 (b)(3).
[16] While appearance in court and public safety are conflated, there is no scientific basis to conclude that one instrument is informative on both issues.
[17] People v. Giminez (1975) 14 Cal.3d 68, 72.
[18] Cal. Rule of Court 4.414 (b)(8).
[19] See, for instance, Faigman, et al., 2 Modern Scientific Evidence: The Law and Science of Expert Testimony, § 9:17, Table 2 (2022).
[20] See, e.g., W.R. Gilks, S. Richardson and D.J. Spiegelhalter, Markov Chain Monte Carlo in Practice (Chapman and Hall, 1996).
[21] See, Dennis V. Lindley, Understanding Uncertainty, (Wiley, 2014).
[22] People v. Burnick (1975) 14 Cal.3d 306, 327.
[23] There are thousands of books on the subject. One unimpeachable source is, The National Research Council, Jeremy Travis, Bruce Western, and Steve Redburn (eds.), The Growth of Incarceration in the United States, (The National Academies Press, 2014).
[24] In re White, 9 Cal.5th 455 (2020).
[25] In re White, 9 Cal.5th 458, fn. 1.
[26] See former CA Penal Code § 1320.34 (2019) suspended upon approval of referendum measure, November 3, 2020 ballot.
[27] Joel K. Cartwright, Sarah L. Desmarais, Justin Hazel, Travis Griffith, and Allen Azizian, Predictive Value of HCR-20, START, and Static-99R Assessments in Predicting Institutional Aggression Among Sexual Offenders, 42 Law & Human Behavior. 13, 14 (2018). See, e.g., Pen. Code, § 290.5, subd. (a)(3) [court may consider “the person’s risk levels on SARATSO static, dynamic, and violence risk assessment instruments” in deciding whether to order continued sex offender registration].
[28] Marion Oswald, Jamie Grace, Sheena Urwin, and Geoffrey C. Barnes, Algorithmic risk assessment policing models: lessons from the Durham HART model and ‘Experimental’ proportionality, 27 Information & Communications Technology Law 223-250 (2018).
[29] President’s Council of Advisors on Science and Technology (PCAST), Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods (September 2016).
[30] Cal. Evidence Code § 402;
[31] Sargon Enterprises, Inc. v. University of Southern California, 55 Cal. 4th 747 (2012) (barring speculation); Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993) (requiring a scientific foundation for forensic evidence).
[32] Seena Fazel, Matthias Burghart, Thomas Fanshawe, Sharon Danielle Gil, John Monahan, and Rongqin Yu, The predictive performance of criminal risk assessment tools used at sentencing: Systematic review of validation studies, 81 Journal of Criminal Justice 101902 (2022).
[33] Id. Note that this meta-study reviewed literature on COMPAS, HRC-20, LS/CMI, LSI-R, OASys, ORAS, PCL-R, PCRA, and Static-99/R.
[34] Hussain S, Rahman A, Abbasi T, Aziz T, “Diagnostic accuracy of ultrasonography in acute appendicitis,” 18 Australas J Ultrasound Med. 67-69 (2015).