# «Inﬁnitesimal chances and the laws of nature∗ Adam Elga Penultimate draft, June 2003 Revised version to appear in Australasian Journal of ...»

Inﬁnitesimal chances and the laws of

nature∗

Adam Elga

Penultimate draft, June 2003

Revised version to appear in

Australasian Journal of Philosophy

Abstract

The ‘best-system’ analysis of lawhood [Lewis 1994] faces the ‘zero-ﬁt

problem’: that many systems of laws say that the chance of history

going actually as it goes—the degree to which the theory ‘ﬁts’ the

actual course of history—is zero. Neither an appeal to inﬁnitesimal

probabilities nor a patch using standard measure theory avoids the difﬁculty. But there is a way to avoid it: to replace the notion of ‘ﬁt’ with the notion of a world being typical with respect to a theory.

1 The zero-ﬁt problem Take a god’s eye view. Before you is the spacetime manifold: the distribution of local qualities to point-sized things and the spatiotemporal re The author gratefully acknowledges help from Mark Colyvan, James Walmsley, Carl Hoefer, Nick Bostrom, Oystein Linnebo, Agust´n Rayo, Daniel Nolan, Greg Restall, and ı an audience at the 1997 meeting of the Australasian Philosophical Association. Special thanks go to Alan H´ jek, who provided detailed feedback at several stages.

a lations among those things. According to the thesis of Humean supervenience, everything that is true of the actual world is made true somehow by this arrangement [Lewis 1986b, ix].1 In order to countenance laws of nature, a defender of the thesis is obliged to say how it is that this arrangement determines what laws there are.

One proposal, the best-system analysis of lawhood [Lewis 1994], draws our attention to various deductive systems. Each system makes only true assertions about what happens, and (perhaps) also makes assertions about the chances of various things happening in various circumstances.

On this proposal, the laws are the regularities that are members of the best candidate system, and the chances are whatever the best candidate system asserts them to be. The best system is the one with the best balance of the following three virtues. Simplicity: A system is simple to the extent that it can be concisely formulated in a certain canonical language.2 Strength: The strength of a system is its informativeness, both regarding matters of particular fact and regarding what chances arise in various circumstances. Fit: Systems that assign chances to certain courses of history also assign a chance to the actual course of history. The ﬁt of such a system I have omitted several qualiﬁcations to the thesis of Humean supervenience. The qualiﬁcationsdo not affect the present discussion. See for example [Lewis 1994, 474-475].

The language is one with a primitive predicate for each perfectly natural property.

See [Lewis 1983, 367-368].

is deﬁned to be that chance—how likely the system counts it that things would go just as they actually do.3 (By stipulation, systems that don’t mention chances have perfect ﬁt.) Here is how the analysis works in a simple case. Consider a world that consists of a long ﬁnite sequence atomic events, each of which happens in one of two ways. Call the events ‘tosses’, and call the ways ‘heads’ and ‘tails’. At one extreme, the entire sequence may be captured by a simple regularity (say, heads and tails alternate throughout). In that case, a system stating that regularity will be simple, strong, and ﬁt well—it will be the best system. As a result, that regularity will qualify as a law.

At another extreme, the sequence may not contain many simply describable regularities. In that case, any system that is very informative regarding the details of the sequence will need to be extremely complicated.

Certain systems avoid this complication by asserting that the sequence is the result of a repeated chance process. Such systems gain much in simplicity at the cost of some strength (they only address the chances of various sequences, as opposed to making claims about the details of the actual Some systems specify the chances of future evolutions given an initial condition, but fail to specify a chance distribution over initial conditions. The above deﬁnition leaves the ﬁt of such systems undeﬁned. A natural ﬁx is to stipulate that the ﬁt of such a system is the conditional chance that the system ascribes to the actual course of history, given that the universe started in the initial state that it did.

sequence). We can compare these chance-ascribing systems by comparing their ﬁts.

For example, suppose that 1/10 of the tosses land heads, and that the heads outcomes are scattered haphazardly among the tosses. One candidate system asserts that the tosses are independent chancy events, and that each has chance 1/10 of landing heads. A competing system also treats the tosses as independent, but asserts that each toss has chance 1/2 of landing heads. The two systems are roughly equally informative and simple, but the ‘chance-1/10’ system ascribes a higher chance to the actual sequence than the ‘chance-1/2’ system does. In other words, the ‘chance system ﬁts better. It is thereby a better competitor in the simplicity/strength/ﬁt competition. Plausibly, this system beats all comers and thereby qualiﬁes as being the correct system of laws.

That is a welcome result for the best system analysis. A friend of Humean supervenience is committed to thinking that the actual sequence of outcomes determine the chances. Given that commitment, it makes sense that a tails-heavy sequence of tosses makes it the case that each toss has a high chance of landing tails.

Notice that in this case, the notion of ‘ﬁt’ deserves its name. The chances ascribed by the ‘chance-1/10’ theory accord well with a history in which 1/10 of the tosses land heads. The chances ascribed by the ‘chance-1/2’ theory do so poorly. And the notion of ‘ﬁt’ measures that difference. More generally, when ﬁnite state-spaces are involved, the notion of ﬁt usefully ranks competing candidate systems of chancy laws. But—as I learned from Ned Hall [1996], and as Nick Bostrom has independently noted [1999]— when inﬁnite state-spaces are involved, the notion of ﬁt no longer ranks chancy systems in a useful way.

To see this, change the above heads/tails example by letting the sequence of tosses be (countably) inﬁnite. Much of the best-system analysis goes through unchanged. Again, if the entire sequence is captured by a simple regularity, then a system stating that regularity will be decisively best. Again, if the sequence is unpatterned, then any system that is very informative regarding the details of the sequence will need to be extremely complicated. And again, certain competing systems avoid this complication by asserting that the sequence is the result of a repeated chance process.

But when we compare these chancy systems by comparing their ﬁts, we run into trouble. We might hope that the systems that ﬁt best are the ones whose chances accord well with the actual pattern of outcomes. But our hopes are dashed: far too many systems have ﬁts equal to exactly zero.

For example, suppose that in the inﬁnite sequence of tosses, 1/10 land heads.4 In order for the notion of ﬁt to do its job, the ‘chance-1/10’ system should ﬁt this sequence better than the ‘chance-1/2’ system does. (Recall that the ‘chance-1/10’ system treats the tosses as independent chancy events with chance 1/10 of heads on each toss.) But that’s not so: both systems ascribe chance zero to the sequence, and so both systems have ﬁts equal to zero.5 More generally, when continuously inﬁnite state-spaces are involved, a great many candidate systems (including systems of chancy laws that physicists have taken seriously) will ascribe zero chance to any individual history.6 That leaves the best-system analysis with no way of differentiating between chance-ascribing systems whose chances accord well with the actual history, and those whose chances do so poorly. This is the zero ﬁt problem.

1/10 of the tosses land heads in the sense that the limiting relative frequency of heads is 1/10.

Proof: Consider the chance-1/2 system (the argument in the chance-1/10 case is similar). For any natural number n, consider the proposition En that speciﬁes the actual outcomes of the ﬁrst n tosses. The proposition E that speciﬁes all of the outcomes is stronger than each En, and so its chance can’t be any greater than the chance of any En.

But the chance of En is 2−n, which gets arbitrarily close to zero as n tends to inﬁnity. So the chance of E—and hence the ﬁt of the chance-1/2 system—equals zero.

For example, any system that treats an inﬁnite series of events as a series of (nontrivial) independent chancy coin-tosses will have a ﬁt of zero. So will any theory of radioactive decay according to which the chance of decay within a time interval is gotten by integrating a density over that interval.

2 Inﬁnitesimals to the rescue?

It can seem odd when a system counts an outcome as possible, and yet ascribes it chance zero.7 There is a way of using nonstandard models of analysis to avoid this oddness. Nonstandard extensions of the real line contain inﬁnitesimals—positive numbers smaller than any positive real number.

And the nonstandard universe contains nonstandard probability functions, which take their values from a nonstandard extension of the reals.8 Nonstandard probability functions may ascribe inﬁnitesimal probability to certain outcomes. As a result, if we allow candidate systems to be associated with nonstandard probability functions, we may impose the regularity condition: that each candidate system ascribe some nonzero chance to every outcome that it counts as possible. In doing so, we may hope to rescue the best system analysis from the zero ﬁt problem. For in that case, no system under consideration will have a ﬁt of zero.9 With standard probability functions, this is often unavoidable, since such functions assign positive probability to at most countably many incompatible propositions.

For an accessible introduction to nonstandard analysis, see [Skyrms 1980, Appendix 4]. For a brief technical introduction, see [Bernstein and Wattenberg 1969, Section 1]. For a thorough technical introduction, see [Hurd and Loeb 1985].

Perhaps Lewis had this strategy in mind. He insists that ‘Zero chance is no chance, and nothing with zero chance ever happens. The [fair spinner’s] chance of stopping exactly where it did was not zero; it was inﬁnitesimal, and inﬁnitesimal chance is still some chance.’ [Lewis 1986a, 176] and writes ‘It might happen—there is some chance of it, innitesimal but not zero—that each nucleus lasted for precisely its expected lifetime...’ [Lewis 1986c, 125]. He also explicitly invokes nonstandard probability functions in another circumstance in which zero probabilities threaten to cause trouble [Lewis 1986c, Let us apply this strategy to the special case mentioned above: a world consisting of an inﬁnite sequence of tosses. If the strategy fails in this case—a simple instance in which the zero ﬁt problem arises—it is sure to fail in general.

And it does fail in this case. To see why, consider the Bernoulli systems— candidate systems that treat the tosses as independent chancy events. For any real number strictly between zero and one, let Bx be the probability function that treats the tosses as independent events with probability x of heads. These functions are all ruled out as chance functions by the regularity requirement. (They assign probability zero to each individual inﬁnite sequence of toss outcomes). What we need are regular nonstandard probability functions to play the role that the functions Bx play in the standard case.

We can be assured that there are such functions. It turns out that for every standard probability function there exists a nonstandard probability function that (i) approximates the standard probability function (in the sense that the two functions never differ by more than an inﬁnitesimal) and (ii) satisﬁes the regularity condition (i.e., assigns positive probability to every nonempty proposition). Appendix A uses a method due to 88–90].

Vann McGee to prove the existence of such functions. Once we have imposed the regularity condition, such functions will represent the chances ascribed by candidate systems.10 So for each Bx there exists regular nonstandard probability function Bx that approximates Bx. So far, so good. Now suppose that in fact 1/10 of the tosses land heads. Which functions ﬁt this sequence of outcomes?

If things turn out well, then the functions Bx with good ﬁts will be the ones for which x is close to 1/10. Unfortunately, things do not turn out well. The trouble is that for each Bx, there are many regular nonstandard probability functions that approximate it. Furthermore, the functions that treat the coin as a chance device with bias 1/10 do not ﬁt any better than the ones that treat it as a chance device with any other bias. For example, it can happen that a nonstandard probability function B1/2 that approximates B1/2 ﬁts much better than a function B1/10 that approximates B1/10 — even though the actual limiting relative frequency of heads is 1/10.

To my knowledge, there are only two methods for cooking up regular nonstandard probability functions. One method—mentioned in the text above and explained in detail in Appendix A—is to start with an arbitrary (standard) probability function and build a nonstandard regular probability function that approximates it. A second method is to focus on a probability space with some natural symmetries, and seek regular probability functions on that space that respect those symmetries. (For example, [Bernstein and Wattenberg 1969] proves the existence of a nonstandard probability function on the unit circle that (i) is rotationally invariant (up to an inﬁnitesimal) and (ii) assigns the same (inﬁnitesimal) probability to each point. [Parikh and Parnes 1972] extends Bernstein and Wattenberg’s construction.) This second method treats only certain very special cases, and so fails to rescue the best-system analysis from the zero-ﬁt problem.