Chances Are Read online

Page 21


  Justinian’s Digest was recovered during that explosive eleventh century which gave us universities, cathedrals, and cities. It was seen, like the surviving works of Aristotle, as a testament from a lost age, to be treated with the respect and minute attention accorded to all holy texts. The University of Bologna, the first Law School, was founded purely to study the Digest. One of its early professors, Azo, noticed that not all proofs in Justinian’s law were complete: so what should one do with types of evidence that fell short of the high standards necessary to convince the court? What, for instance, if one had the testimony of only one witness, or an unwitnessed but credible document? Azo called these “half-proofs” and suggested that two halves could make a whole. His successors created an entire new legal arithmetic for building cases from probabilistic components: suspicion, various sorts of presumption, indication, argument, support, and conjecture.

  Each element of the new construct was derived from a Roman legacy, but the spirit of it—its subtlety, proliferation of terms, and artificiality—was entirely medieval. Corals of interpretation grew over the rock of law, and their effect was to move questions of likelihood and credibility from rhetoric into textual analysis. It was no longer the audience in the forum that would decide if an interpretation was likely; it was the skilled professional with a degree. Justinian had intended to give the world law; unintentionally, he gave it lawyers.

  Law is eternal truth, but to fit the facts of a changing world into the ancient form of statute meant framing what actually happened in terms of what never happens. This was the doctrine of “legal fiction”: making the actors in modern cases play their parts under the names and in the costumes of long-dead characters. For legal purposes, all ducks were “beasts,” all civil cases in England were about an assault in Middlesex, and all property disputes revolved around the rights of John Doe and Richard Roe, fictional men to whom the owners had notionally leased their land. Some civil cases pretended to be criminal: in order to get his case into the royal court, one fourteenth-century claimant had to assert that the vintner from whom he had bought bad wine had watered it down “against the peace of the king, to wit with bows and arrows”—on the face of it, a method more likely to spike the drink.

  One fiction, though, was the saving of the British and American legal systems: the common law. This asserts an unwritten but supreme tradition—what has always happened—glimpsed only in the mirror of past judgments. There can be nothing new in common law—yet there is always something new to find in it as the world changes and judges inquire more deeply. Code-based law hurtled inevitably toward the road-block of contradiction, but the common law offered a network of country lanes leading in a leisurely circuit around any obstacle.

  Under Justinian, you could decide by the Code—but code and reality were now too far apart. In the Middle Ages, you could look to Authority—but there were now so many opinions on each side that they canceled each other out. By Rabelais’ time, matters had gone beyond confusion to absurdity: his character Judge Bridlegoose claimed that the only perfect, impartial method of deciding a case was . . . to throw dice.

  Rabelais was trained as a lawyer; so was Fermat. Cardano and Pascal were sons of lawyers. Several Bernoullis studied law before mathematics tempted them away. The originators of probability had a clear sense of how maddeningly intractable law is to reason. Now they hoped this new method, so successful in disputes about dice and stakes, could extend to Justice.

  Leibniz was another mathematician who began by studying law. When he came up with the basic notation still used in probability—0 for the impossible, 1 for the certain, and all the fractions in between for the varyingly probable—his intention had been to use this to measure legal validity: a more subtle and continuous version of Azo of Bologna’s arithmetic of half-proofs. Leibniz was sure that it was possible not only to determine numerical values for the probability of statements, but to combine these into a calculus of inference, mechanically “estimating grades of probability and the status of proofs, presumptions, conjectures, and indices.” But he reckoned without Bernoulli, who simply asked Leibniz what legal examples he could think of that reveal their intrinsic probabilities after the fact, as mortality tables reveal the average length of life. Leibniz came up with nothing; the plan for a judicial calculator went the same way as his attempt to reconcile Protestants with Catholics.

  Bernoulli’s point was valid: statistics and probability are not the same. Even perfectly accurate and impartial records of crime will tell you nothing about this accused in this particular case. Nor was his the only valid objection to applying a calculus of probabilities to the law. Can you, for example, repeat a murder to reveal the weight of its evidence, as you would repeat the draw from an urn? Would you be willing to try the same suspect 25,550 times so as to be 99.99 percent certain that your verdict reflected the truth? Say two independent witnesses tell the same story but each is only half-credible; shouldn’t you (since they are independent) multiply their probabilities? But if you do, you turn two half-proofs into one quarter-proof. This doesn’t sound like the way to purge law of its ambiguities.

  The deeper difficulty is that legal probability is a function of opinion: our view of the truth of a conclusion, based on our view of the truth of some evidence. Classical probability is about things; legal probability is about thought. What legal reasoning required was a calculus of personal probability: a method to track the trajectory of opinion as it passed through the gravitational fields of new and unexpected facts.

  The solution appeared in 1763 among the papers of a recently deceased Presbyterian minister in the quiet English spa town of Tunbridge Wells. The Reverend Thomas Bayes was a Fellow of the Royal Society and accounted a good amateur mathematician, but he had made little name for himself during his lifetime. He left his papers to a long-neglected friend, Richard Price, who found in them what he considered excellent ammunition against the skeptical views of David Hume. Hume, you will remember, said that the fact the sun has risen every morning gives us no evidence about the likelihood of its rising tomorrow. An Essay Towards Solving a Problem in the Doctrine of Chances, the piece Price found in Bayes’ papers, offered precisely this: a method to measure confidence in the probability of a single event based on the experience of many.

  Bayes’ Essay occupies in probability circles much the same position as Das Kapital in economics or Finnegans Wake in literature: everyone refers to it and no one reads it. What is now called Bayes’ theorem uses modern notation and is most easily demonstrated with a diagram. Say you want to know the probability of an event A given the occurrence of another event B: what is described in modern notation as P(A|B).

  Looking at the diagram, we can see that the probability of both events happening—P(AB)—is the shared area in the middle; moreover, P(AB) is the same as P(BA): it is Saturday and sunny, sunny and Saturday. We can also see that the probability of both events happening given that B has happened—the “conditional” probability—is shown by the proportion of AB to all of B. Rewriting this sentence as an equation gives us:

  We are now ready for a little manipulation. As is always the case in algebra, almost anything is allowed as long as we do it to both sides of an equation. We’ll start with our two ways of describing the center section of the diagram:

  We then multiply both sides of that equation by 1—but, sneakily, we’ll use slightly different ways of expressing 1 for each.

  which is the same as:

  But wait! The first term on each side is also our definition of conditional probability; so, by substitution, we produce:

  and, dividing both sides by P(B), we get:

  Where are we going with this? As is so often the case, we seem deepest into arbitrary juggling of terms when we are actually closest to a surprising truth. There is one last feat to accomplish, though. Look back at B in the diagram. We could, humorously, define it as the sum of its parts: as being everything that is both B and A—that is, P(BA)—plus everything that is B and not
A: P(BA). With two passes of our trusty definition of conditional probability, we could then expand this to say:

  In more straightforward terms, this tells us that the overall chance of B happening is a weighted combination of its probability given that A happens (times A’s own probability), and its probability given that not-A happens (times not-A’s own probability). Casanova’s chance of seducing the countess depends on how swayed she is by charm times the likelihood that he will be charming plus how repelled she is by boorishness times the chance that he will be a boor.

  Let’s slot this expanded version of P(B) into our equation in progress:

  Or:

  What do we have here, other than a thicket of parentheses? Amazingly, a description of the effect of experience on opinion.

  Look: if we call A our hypothesis and B the evidence, this equation says that the truth of our hypothesis given the evidence (the term to the left of the equals sign) can be determined by its previous probability (the first term on the right) times a “learning factor” (the remaining, thickety term). If we can find probabilities to define our original state of mind and estimate the probabilities for the evidence appearing given our hypothesis, we now have a method for tracking our reasons to believe in guilt or innocence as each new fact appears before us.

  How could this work in practice? The old woman’s body was found in her apartment, brutally hacked; we know the student, Raskolnikov, had been quarreling with her—something to do with money. Then again, she was a pawnbroker: she could have had many enemies among the poor in the neighborhood—all desperate, all in her debt—many no doubt rough men in hard trades. The boy stands in the dock; he seems more pitiable than frightening, his hands none too clean, but soft. Maybe he did it; maybe he didn’t—our opinion is evenly balanced.

  Then the forensic expert testifies about the ax: the latent fingerprints on it are similar to Raskolnikov’s. But are they his? The expert, a scrupulous scientist, will not say; all that his statistics can justify is a statement that such a match would appear by chance only one time in a thousand.

  We slot our probabilities into Bayes’ formula: P(A|B) is our new hypothesis about the suspect’s guilt, given the fingerprint evidence; P(A) is our previous view (.5), P(B|A) is the chance of a fingerprint match, given he was guilty (1); P(B|A) is the chance of a fingerprint match given he was not guilty (.0001); and P(A) is our previous view of Raskolnikov’s innocence (.5). Put it all together:

  The iron door swings shut and the haggard figure joins the chain of convicts heading for Siberia.

  And the sunrise? Bayes’ theorem tells you that you can go to bed confident, if not certain, that it will rise tomorrow.

  About Bayes’ time, a judge, so the story goes, warned the voluble man in the witness box: “I must ask you to tell no unnecessary lies; the lies in which you have been instructed by counsel are required to support his fraudulent case—further untruths are a needless distraction.”

  Nicholas Bernoulli felt that the world would be a better place if we could compile statistics on people’s veracity. Certainly, it helps to begin with an estimate. Rogue X and fool Y stand up in succession, not knowing each other and with no reason to be in collusion. Rogue and fool each affirm that statement S is true. X is so shifty it would be hard to consider him truthful any more than about a third of the time; give him a credibility rating of .3. Y is so dense that his credibility is little better—say, .4. Moreover, S—if it is a lie—is only one of the five or so unnecessary lies they each could tell, so we should multiply the probability of their coming up with this particular lie by 1/5. How then, all in all, does their joint testimony affect our impression of the truthfulness of S? Our belief swings back and forth—they are two weak reeds; but they support each other; but . . .

  Bayes can help us evaluate dubious testimony. The Honorable Sir Richard Eggleston, one of Australia’s most prominent jurists, plugged these numbers into Bayes’ theorem to show how the stories of two independent but only partly credible witnesses should affect our confidence in the truth of the statement to which they both have sworn. His equation looks daunting:

  —but it shows that given the combined testimony of X and Y, the probability that the statement is true is more than 7 times greater than it was before. A rogue may have his uses and a fool be a present help in trouble.

  Plugging numbers into a machine and turning the handle seems a high-handed approach to delicate matters of judgment and belief, and some of those numbers seem rather arbitrary. Can we assume even chances of guilt and innocence? Can we assign credibility ratings? How can any of this be justified?

  Ever since it appeared, there have been loud voices raised against the legitimacy of this “inverse probability.” Bayes himself spoke in terms of expectation, as if experience were a game on which we had placed a bet. But what bookie offers the starting price? Where do we get the prior probability that evidence modifies? Laplace offered a grand-sounding justification, the Principle of Insufficient Reason: if there is nothing to determine what the prior probability of two events might be, you can assign them equal probability. This, to many, is heresy; it made Fisher plunge and bound; it made von Mises’ smile ever tighter and more frosty.

  The opposing argument, expressed by the so-called subjectivist school, gained its point by sacrificing rigor. What, it asked, are we really measuring here? Degrees of ignorance. Evidence slowly clears ignorance away, but it does so only through repeated experience and repeated re-assessment of your hypotheses—what some Bayesian commentators call “conditioning your priors.” Without evidence, what better describes your state of ignorance than an inability to decide between hypotheses? Of course you give them equal weight, because you know nothing about them. You are testing not the innate qualities of things, nor the repeatability of experiment, but the logic of your statements and the consistency of your expectations.

  Law can be unclear, inconsistent, and partial—but only the facts are uncertain. We have leading cases (and, of course, legislation) to correct law, to patch or prop up the palace of justice where its builders have economized or worms attacked its timbers. To improve our understanding of legal fact, we ought to have probability—but it hasn’t had much success in the courts. After all, few people choose to study law because they want to deal with arithmetic: shining oratory and flashes of forensic deduction are the glory of the courtroom. Formal probability is left to the expert witness; and all too often his expertise is used to baffle.

  In the 1894 Dreyfus treason trial, experts were brought in to prove that the reason the handwriting in the suspect documents looked nothing like Dreyfus’ was precisely his deliberate effort to make them look like a forgery. The experts showed that the documents and Dreyfus’ correspondence had words of similar lengths; they pointed out four graphological “coincidences” and—assigning a probability of .2 to these coincidences—calculated the likelihood of finding these four by chance at .0016. But here they made a mistake in elementary probability: .0016 is the chance of finding four coincidences in four tries; in fact they had found four in thirteen possible locations, so the probability that this would occur by chance was a very generous .7. Both Dreyfus’ counsel and the Government commissioner admitted they had not understood a single term of the mathematical demonstration—but everyone was impressed by its exquisitely pure incomprehensibility. Dreyfus was condemned to Devil’s Island, and despite widespread agitation for his release, remained there for four wretched years.

  The textbook example of probabilistic clumsiness remains People v. Collins, a seemingly simple case of mugging in Los Angeles in 1964. Juanita Brooks was coming home from shopping, her groceries in a wheeled wicker basket with her purse perched on top. As she came along the alley behind her house, she stooped to pick up an empty carton and was suddenly pushed over by someone she had neither seen nor heard approaching. Although stunned by her fall, she could still say that she saw a young woman running away, who weighed about 145 pounds, had hair “between a dark blond and
a light blond,” and was wearing “something dark.” When she got up, Mrs. Brooks found that her purse, containing around $40, was missing.

  John Bass lived at the end of the alley; he was watering his lawn, heard the commotion, and saw a woman with a blond ponytail run out of the alley and jump into a yellow car that, he said, was being driven by a black man with a mustache and beard.

  Later that day, Malcolm Ricardo Collins and his wife, Janet Collins, were arrested for the robbery. She worked as a housemaid in San Pedro and had a blond ponytail; he was black, and had a mustache and beard; they drove a yellow car. They were short of money. Their alibi was far from watertight.

  Nevertheless, the prosecution had a difficult time identifying them as the robbers. Neither Mrs. Brooks nor Mr. Bass had had a good look at the woman who took the purse, nor had Mr. Bass been able to pick out Malcolm Collins in a police lineup. There was also evidence that Janet Collins had worn light clothing that day—not “something dark.”