Chances Are Page 6
Lemberg, alias Lwów, alias Lviv: a city that lies at the intersection of three sets in three dimensions: Polish, Austrian, Ukrainian; Catholic, Orthodox, Jewish; applied, abstract, artistic. It remains a symbol of intellectual promise for the debatable lands between the Vistula and the Dniepr; a Baroque lighthouse in a politicogeographic tempest. Its prominent sons and daughters would be themselves enough to populate a culture: the writers Martin Buber and Stanislaw Lem; the pianists Moriz Rosenthal and Emanuel Ax; the Ulam brothers, Stanislaw (mathematician) and Adam (historian); Doppler of the eponymous effect; Redl the spy—not to mention Weegee, Paul Muni, Sacher-Masoch, and the Muslim theologian Muhammad Asad (one of the few imams to be the son of a rabbi).
Lemberg’s Richard von Mises was a pioneer in aerodynamics, designing and piloting in 1915 Austria-Hungary’s monster bomber, the 600-horsepower Aviatik G. III. The plane was not a success, for many of the subtle local reasons that govern heavier-than-air flight. Perhaps in reaction, von Mises became increasingly interested in turbulence. Turbulence (as we will see later) lacks the pleasant predictability of the solar system; the swirls of fluid vortices may briefly resemble stately galaxies, but their true dynamics remain infuriatingly difficult to grasp. Von Mises was not an easygoing man—he demanded of applied mathematics all the rigor of its pure cousin—and the more he worked in the unstable, fluttering world of flow, the less he liked the fixed but unexamined assumptions behind Laplace’s idea of probability; defining it as “a number between 0 and 1, about which nothing else is known.”
The problem, von Mises thought, was that in assuming equally probable cases for our dice, coins, and urns, we had created out of nothing a parallel universe, in which things happened a certain way because they were supposed to. Instead of being messengers of the gods, the dice had become the gods. At best, the rules of probability were a tautology: the numbers 1 through 6 come up equally often in theory because we define them that way. At worst, the concept of equally probable cases prevented us from saying anything about what was before our eyes. What if the die has a few molecules missing from one corner, for instance? We have evidence, but no theorem based on equally probable cases applies to it; our probability calculus is irrelevant; we have to fall silent.
Von Mises’ view was that the true reason for our believing that six should come up 1/6 of the time is no different from our reason for believing that the Earth takes 365.25 days to orbit the sun. That reason is our having observed it. “The probability of a six is a physical property of a given die”—a quantity derived from repeated experience, not some innate essence of creation or nature. Heads or tails, red or black, pass or no pass—these are no more phenomena in themselves than are grams, ohms, or pascals. Probability is a measure of certain aspects of consistent groups of events (“collectives,” in von Mises’ terminology) revealed when these events are repeatable indefinitely.
The nature of the “collective” has to be very particular: one must have a practically unlimited sequence of uniform but random observations. We can conclude that we have observed a given probability for a result if the relative frequency of that result approaches a limit, not just for the basic collective, but for randomly selected and mixed subgroups of the collective. The probabilities of combinations of results (like throwing two dice or taking two balls at a time out of an urn) can also be defined by keeping careful track of the order and subgroups of observations. That means that, while rigorously banishing any preconceptions of probability from our mind, we can gradually rebuild many aspects of its calculus—as long as we insist on describing frequencies, and restrict our observations to true collectives.
There was no doubt where science had to go to remain science: all facts were to be considered as mere probabilities, and all probabilities, frequencies of observations. As von Mises saw it, anything else was simply a kind of false consciousness.
Given this view of pure science, it is certainly hard to see how probability could be legitimately applied to juries, deliberative assemblies, or voting systems. For von Mises, probability had only three areas of legitimacy: games of chance, certain mass social phenomena like genetics and insurance, and thermodynamics (where, as we will see in Chapter 11, it would take the leading role).
“Dost thou think, because thou art virtuous, there shall be no more cakes and ale?” Like all puritans, within or without the sciences, von Mises was asking his audience to give up much of what they felt made life worthwhile. People go into science because they want to discover and explain all the things around us that seem so richly freighted with meaning. But how many of the interesting phenomena of life truly are “collectives”—what fascinating event can be said to repeat exactly and indefinitely? While Laplace’s system glided over the crack between stating equally possible outcomes and assuming them in experiment, von Mises’ straddled a crevasse when it assumed that the relative frequencies of observed events could indeed approach a limiting value. All might be revealed in the long run—but, as Keynes pointed out, in the long run we are all dead.
Trains of thought can sometimes seem exactly that; bright little worlds rattling through unknown and perhaps uninhabited darkness. It is the cliché tragedy of intellectual life to discover, far too late, that your own train got switched off the main line up a remote spur whose rusty rails and lumpy roadbed reveal all too clearly that it leads only to ghost towns. Sometimes, though, there is the excitement of approaching the metropolis on converging lines: other golden windows glide alongside with other travelers folding their newspapers or reaching for their coats, other children waving in welcome.
Lebesgue’s measure theory and von Mises’ idea of frequency were ideas that were already within view of one another. Between them, two further trains were advancing: one was the growing conviction among physicists that certain processes in thermodynamics and quantum mechanics had only a probabilistic meaning—that there was no mechanical, deterministic model by which they could be described or even imagined. The other was the great contemporary movement, which promised that of all of mathematics—all the rigor and complexity that, magically, seem to find so many parallels with the richness and beauty of life—could be founded on a few axioms linked by the rules of deductive logic. There was a palpable sense of approaching the terminus, where the many travelers could exchange stories—all in the same language.
This shared language was the notion of the set, introduced by Georg Cantor as a means of keeping the infinite in mind without having to think of infinite things or infinite processes. The definition of a set is intentionally as loose as possible: it is defined by its members. But membership can be generated by the most varied of rules: “numbers divisible by 5” determines a set; but so does “Dog; 17; Red.” It is possible to have an empty set: we can talk about the contents of the box even though there’s nothing in it. We can subdivide a set into subsets, which will also be sets. We can combine sets into a union, which is also a set. We can define where two sets overlap; the overlap or intersection is also a set. We can have infinite sets (such as all the counting numbers). And we can imagine the collection of all subsets of a set—which is also a set.
What, you might feel yourself asking, are we actually talking about here? Nothing in particular—and that’s the point. This is pure form; its aim is to support a logical system that governs every instance of the way we consider some things as distinct from other things. A set is simply a pair of mental brackets, isolating “this” from “not this”; we can put into those brackets whatever interests us. Just as Cantor’s infinity can be put in a set and considered here and now, rather than endlessly and forever, von Mises’ indefinite sequences of observations can constitute a set: the set of events of observing something. We can use the axioms by which we manipulate sets to manipulate collections of events. Most important, Lebesgue’s concept of measure gives us a method for assigning a unique and complete value to a set, its subsets, and its elements—and these values behave the way we want probability values to behave.
/> Lines of thought, all coming together, all converging—so who better to effect the final union than a man who was born on a train? Andrei Nikolaevich Kolmogorov was a son any parent would be proud of, but neither parent ever saw him. His mother died in bearing him, on April 25, 1903, journeying from the Crimea to her family estate, having left behind his father, an agronomist of clerical origin. The newborn Kolmogorov was swept up from the whistle-stop town of Tambov by his maiden aunt. She created a special school for her talented nephew and his little friends; it had its own magazine in which the young Andrei published his first mathematical discovery at the age of five: that the sum of the first n odd numbers is equal to n2, as we too discovered in Chapter 1.
What immediately struck people who met Kolmogorov was his mental liveliness. He remained interested in everything, from metallurgy to Pushkin, from the papacy to nude skiing. His dacha, an old manor house outside Moscow, re-created the estate school of his youth. Presided over by his old aunt and his old nanny, it had a large library and was always full of guests: students, colleagues, visiting scholars.
It has been said that it would be simpler to list the areas of mathematics to which Kolmogorov did not make a significant contribution than to describe the vast range of topics he did explore—in depth—in his more than seventy years of productive work. His genius was to connect: he took mathematical ideas, clarified their expression, and then used them to transform new fields. He worked on mathematical logic, linking the classical and intuitionist traditions; he worked on function-space theory, extending it to the mechanics of turbulence; he invented the field of algorithmic complexity, and, with characteristic verve, he hoicked up the tottery edifice of probability and slipped new foundations underneath.
The basic premise of his system was simple: the probability of an event is the same as the measure of a set. We can use a diagram to make this idea even clearer. Take a rectangle:
Let’s say that everything that can happen in the system we’re interested in—every possible observation—is represented by a point in this rectangle. The probability measure of the rectangle (which, for flat things like rectangles, is its area) is therefore 1, because we can be certain that any observation is represented within it. If we are interested in, say, the flip of a coin, our diagram will look like this:
Two possible states, with equal area, having no points in common. The chance of throwing an even number with one die? Three independent, mutually exclusive events, totaling half the area of our rectangle:
We can see that this model nicely represents a key aspect of probability: that the probability of any of two or more independent events happening is determined by adding the probabilities of each. What about events that are not independent (such as, for instance, the event A, that this explanation is clear—and B, that it is true)? They look like this:
The probability that this explanation is both clear and true is represented by the area shared between A and B. The probability that it is either clear or true is represented by their combined area—although this is not the same as adding their individual areas, since then you would be counting their shared zone twice. The worrying probability that this explanation is neither clear nor true is represented by the bleak, empty remainder of the rectangle.
What about conditional probability—such as the probability this explanation is true if it is clear? We simply disregard everything outside the area of A (since we presume the explanation is clear) and compare its area with the area that it shares with B—which, as we know, represents clear and true.
You may find this all a bit simplistic, especially as someone who has come to it through the complex reasonings of Cardano, Pascal, de Moivre, Laplace, and von Mises. The point, though, is that this basic model can be scaled up to match the complexity of any situation, just as Euclid’s axioms can generate all the forms needed to build Chartres cathedral. We need not think only of two circles; we can imagine hundreds, thousands, indeed an infinity of measurable subsets of our rectangular sample space, overlapping and interpenetrating like swirls of oil on water. Nor need our space be a two-dimensional rectangle; the same axioms would apply if our chosen measure were the volume of three-dimensional objects or the unvisualizable but mathematically conventional reality of n-dimensional space. And since this idea of probability borrows its structure from set theory, we can do logical, Boolean, calculations with it—well, we can’t, but computers can, since they thrive on exactly those tweezer-and-mountain techniques of relentlessly iterated steps that fill human souls with despair.
There need be no special cases, cobbled-together rules or jury-rigged curves to cover this or that unusual situation. The point of Kolmogorov’s work is that mathematical probability is not separate from the remainder of mathematics—it is simply an interesting aspect of measure theory with some quaint terminology handed down from its origins in real life.
Thus embedded, probability—understood as the mathematics of randomness—found again the rigor of deductive logic. True, it appeared at first to be a somewhat chilly rigor—one that its practitioners were keen to distinguish from the questionable world of applications. William Feller, who wrote the definitive mid-twentieth century textbook on probability, began it by pointing out: “We shall no more attempt to explain ‘the meaning’ of probability than the modern physicist dwells on the ‘real meaning’ of mass and energy.” Joseph Doob, one of the most prominent, if stern, proponents of probability theory, said that it was as useless to debate whether an actual sequence of coin tossing was governed by the laws of probability as “to debate the table manners of children with six arms.” As always, the counselors of perfection demand a retreat from the world.
That, though, was to reckon without humanity. Our desire to come to some conclusion—even if it’s not certainty—means we are bound to take this specialist tool of probability and risk its edge on unknown materials. Kolmogorov’s legacy is applied every day in science, medicine, systems engineering, decision theory, and computer simulations of the behavior of financial markets. Its power comes from its purity—its conceptual simplicity. We are no longer talking specifically about the behavior of physical objects, observations, frequencies, or opinions—just measures. Purged of unnecessary ritual and worldly considerations, uniting its various sects, mathematical probability appears as the one true faith.
But having a true faith is not always the end of difficulties. Consider a simple problem posed by the French mathematician Joseph Bertrand: you have a circle with a triangle drawn inside it—a triangle whose sides are equal in length.
Now you draw a line at random so that it touches the circle in two places: this is called a chord. What is the probability that this chord is longer than a side of the triangle?
One good path to an answer would be this: line up a corner of our triangle with one end of the chord; we can now see that any chord that falls within the triangle will be longer than a side.
Since the starting angle of lines that cross the triangle is one-third the angle of all possible chords you could draw from that point, this suggests that the probability that a random chord will be longer than a side of the triangle is 1/3.
But here is a different approach. Let’s say you take your chord and roll it across the circle like a pencil across the floor.
We can see that any chord falling within the rectangle built on a side of the triangle will be longer than that side. The height of that rectangle is exactly half the diameter of the circle; so the probability that a random chord will be longer than a side of the triangle is 1/2· Paradox.
Probability remains, as Charles Peirce pointed out, the only branch of mathematics “in which good writers frequently get results entirely erroneous.” Defining probability in terms of measure is clear, consistent, and intuitively satisfying; but, as Bertrand’s paradox reveals, we must be very careful in how we set up our problems—in this case, how we define that unassuming phrase “drawn at random.” Just as in the Delphic temple, the validity of the answer wi
ll depend on the nature of the question. You must be careful what you pray for.
4
Gambling
In play there are two pleasures for your choosing;
The one is winning and the other—losing.
—Byron, Don Juan, Canto XIV
Asummer’s evening in Monte Carlo. The warm breeze is scented with salt, aftershave, cigar smoke, and marine diesel. As you climb the carpeted steps, your tread silent but determined, the marble façade glows pinkly above you while golden effigies smile down. You feel lucky: chosen, fortunate, blessed—though you might wonder idly as you pass through the glowing Salon de l’Europe why it is that these people have so many more and bigger chandeliers than you do. The answer is simple: some things happen more often than other things . . . in the long run.
“Mesdames et messieurs, faites vos jeux.” Roulette is a teaching machine for basic probability. Look at the wheel and the cloth, and you will see that here is a device for counting events and gathering them into separate groups. There are 36 numbered cups on the wheel, alternating red and black. Given the compounded, confusing effects of the speed of the wheel, the baffles and deflectors, the bounce of the ball, and the casual flick of the croupier’s wrist, you have no intrinsic reason to believe the ball will land in one of those cups sooner than in any other: the 36 chances appear equal. Given a long enough time, then, you can expect each cup to receive the ball in 1/36 of the total number of spins.
You may bet on a single number, two numbers, three, four, five (in the United States), six, twelve, or eighteen, grouped in various ways: odd or even, red or black, low or high, quadrants of the wheel or orphelins—leftovers after some such quadrature. The house, obligingly, adjusts the odds it offers you to the proportion of the 36 numbers represented by your bet, from 36 times (including your original stake) for a single number down to doubling your money on winning bets that split the numbers in half. Indeed, one reason the roulette wheel has 36 numbers is that 36 divides so neatly into different proportions, accommodating everyone’s secret sense of fate’s symmetry and neatness.