Theories of Everything: my thoughts

(Adapted from my message of 28th September 1997 to Max Tegmark giving my thoughts on his ultimate ensemble theory paper.)

Hello Max... I came across your paper "Is `the theory of everything' merely the ultimate ensemble theory?" (gr-qc/9704009) on the web. I found it by a rather circuitous route. For several consecutive years now, here at Imperial College Department of Computing, I've been supervising an undergraduate individual project on quantum computing (with Peter Knight of our quantum optics section agreeing to act as an occasional informal "extra marker" to patch over the fact that the whole thing's not really my field!). When looking through the recent papers in the quant-ph section of the Los Alamos archive, just to try and keep up with things as best I can, I came across a link to your website, and thence to your ultimate ensemble theory paper. (Since then, of course, your paper's been covered by New Scientist in their "anything goes" article of 6th June 1998, and become much more widely known.)

Well, it was a joy to read, and I'm glad someone's "thinking big" in this way! Have you come across John Leslie's semi-popular book Universes? It deals with many of the same themes - although the later parts drift into a sort of religious apology, which I don't find particularly useful. Anyway, I have some thoughts on various aspects of your paper, which I hope you might find interesting or thought-provoking. So here goes!

About the Bayesian inference approach:

I think the biggest issue here is choice of an "ultimate prior". If I've understood things right you're proposing the "uniform prior" in all parameter spaces: but is this really well-defined? (For example, uniform, log-uniform, or uniform in the "arctan-log" mapping you use for convenience in your graphs, will all give subtly different conclusions [posteriors] in any given application of the Bayesian approach.)

Stepping one level further out so to speak, there's also the inevitable problem of how to be "uniform" in the space of mathematical structures you give a sample of in your Figure 1. The diagram is by necessity very heterogeneous - after all any sub-parts of it that form ordered chains (like "double fields", "triple fields", ...) either get cut off due to inconsistency, or else, if that doesn't happen, we'd probably choose to have drawn the diagram differently (e.g. having one box called "generalized fields" with an implicit internal parameter n giving the order "double", "triple" etc). Casting a bird's-eye view over the whole diagram, one shudders at what it could even mean to paint the diagram with a "uniform" ultimate prior!

I do in fact have a provisional suggestion about this. It's at least vaguely related to notions of Kolmogorov-Chaitin algorithmic complexity. However I'll leave it till further on, when it'll make more sense (I hope!).

Meanwhile, here's a dramatic instance of the problems an over-simple concept of "uniform" might cause. In your footnote 13 (page 24), you're discussing your earlier pessimism about the whole ultimate ensemble theory project, when you thought all integer space dimensionalities (or 3 upwards anyway) were likely to be habitable. But: imagine this had been true, i.e. all dimensionalities >=3 are habitable. (It might even be true! One can think of reasons why not all forces would necessarily be "inverse-(D-1)th law" and thereby lead to the stability problems you discuss. But more of that later.) What would the consequences be for your whole "category 1a TOE" philosophical project? I believe it's wrong to say they would have been the pessimistic consequences you feared. That is, it's wrong to say that when we opened our eyes and counted our space dimensions and got 3, we'd thereby falsify the whole category 1a TOE project. For consider this: if the category 1a TOE is true, and contains SAS-inhabited worlds of all dimensions (3,4,5,...), then every SAS opens its eyes, counts its space dimensions, and (falsely and pessimistically) concludes the category 1a TOE is falsified!!! For example, creatures who count their space dimensions and get 4384792698273462 would, if they took a "uniform" prior over the integers seriously, have to go on to say: "Gosh! 4384792698273462 is such a tiny number, out of all the integers! Gee, I guess that falisifies the category 1a TOE." You see the problem? The use of a "uniform" prior over the integers would condemn every one of those SASs to reaching a false conclusion. (Slogan: All integers are tiny! Or: All integers are untypically small!)

A further, more technical problem is that in probability theory as currently practiced (with its demand that probabilities can be normalized to add up to 1), a uniform prior over the integers or other countable discrete sets doesn't even exist. However, I think one could argue that this problem is a lesser one - maybe "God's probabilities" don't really have to be normalized to add up to 1! A strange thought.

(Note that a uniform prior over the reals, or R^3 or R^4 or whatever, strictly speaking doesn't make sense in today's probability theory either! The reason this doesn't cause a crisis in Bayesian probability theory is simply that usually the very first observation in an experiment immediately modifies this dodgy prior to a normalizable distribution. But the case above [SASs of all space dimensions opening their eyes and counting] is uncomfortably different in style.)

I'll give my own musings on how we might get round these problems of interpreting uniformity later on.

About the axiomatization of mathematical structures:

Gregory Chaitin's work on algorithmic (Kolmogorov-Chaitin) complexity provides an intriguing counterexample to your Figure 2. (I realize Figure 2 is just a qualitative graph, and also not particularly pivotal to the paper as a whole. But you might still be interested in my counterexample. It touches on deep issues of how to think about axiomatizations, models, and the like.)

Have you heard of Chaitin's constant capital-Omega? You have to fix on a particular universal Turing machine, but once you've done that, Omega is a perfectly definite constant, just like pi. Unlike pi, it's uncomputable. Worse: it's "maximally" uncomputable. That is, given the first n binary digits as an oracle, you'd still have no way (no Turing algorithm) of guessing the subsequent digits with a success rate different from 50%. Omega is "(incompressibly) random" in the jargon of algorithmic information theory - although it's not, of course, "random" in the broader sense of capricious or underdetermined: it's just as determined (once your favorite universal Turing machine has been fixed) as pi or e.

Omega is defined as the probability that the universal Turing machine will halt, given a semi-infinite input tape of bits chosen randomly by independent coin tossings. - I hasten to add that the definition can be given without using words like probability. I'm just choosing to give the definition as intuitively clearly as possible.

Omega has the fascinating property that there is a simple Turing algorithm (though with a ridiculously long run-time) which, if given the first n binary digits of Omega as an oracle, can then solve the halting problem(!) for "generic Turing machines" (i.e. programs-with-embedded-input for our fixed universal Turing machine to run) of length up to and including n. (Taking the limit: if the whole of Omega is available as an oracle, the halting problem for Turing machines is solvable. However, the halting problem for Omega-oracle-enhanced Turing machines is still not solvable, for the usual reductio ad absurdum reasons.)

Since Omega is incompressibly "random", any particular consistent formal system (say ZFC, or ZFC plus your favorite axioms of large cardinals) cannot give a correct ruling on more than a finite number of digits of Omega. Thus, one can add new axioms: "Axiom 729. The 729th digit of Omega is 1", and so forth - in such a way that one is genuinely strengthening the formal system's complexity (and power) at each point - in the full, strict sense that the algorithmic complexity of the set of axioms is rising - and yet the system remains consistent. Thus, the "cliff edge" of Figure 2 is not compulsory!

There's something disturbing about all this. It means that mathematical reality can pick out certain structures as "special" (what you might call low "definitional" complexity), like Chaitin's number Omega, but formal systems cannot attain the state of handling these structures competently and flawlessly (any particular formal system will have either incomplete or wrong embedded knowledge of the structure). Now of course we knew this all along because of Gödel's theorem, but Chaitin's work puts a new, embarrassing quantitative spin on all this. That is, after Gödel's theorem mathematicians could shrug and say "Well, who really cares that a few highly esoteric propositions, of the form only metamathematicians ever use, are undecidable?" But the digits of Chaitin's Omega are not of that form. They act as oracles for an intensely practical problem (the halting problem - which, if one is doing physics by simulation, is the question "does this system reach a stable state?" or the like), and there are aleph-null of them, and we can only know finitely many (0%) of them!

We might want to "loosen up" Figure 1 (the giant structure diagram), in the sense that formal systems are perhaps being awarded an unfairly pivotal role in the diagram - mathematical reality can be construed to go beyond the collection of all formal systems. On the other hand, I suppose this is the issue (the nature of mathematics; Platonism, formalism, etc) that no-one can quite agree on, or even decide how to argue about! I'm guessing that you were not necessarily saying to the world "Ich bin ein Formalist" by designing Figure 1 the way you did. Rather, you were "playing safe" by sticking to the structures that mathematicians of all persuasions would agree existed, or could be talked about as if they existed: (structures described by) formal systems. Have I guessed right?

About the prospects for SASs in other space dimensionalities:

Here's a way space dimensions greater than 3 could turn out to be inhabited after all, and even by "particle-based" SASs (rather than string-based or membrane-based or whatever). The idea is we don't have to take the "inverse (D-1)th law" as a compulsory force law in dimension D. It's true you get that when you use massless, non-self-coupling exchange bosons in (D+1)-dimensional QFT. But maybe the important (most macroscopically obvious) forces will not be those mediated by massless non-self-coupling bosons! For example, in (3+1)-dimensional QCD, the color force between quarks is something like constant, or direct-linear (anyway, something that leads to confinement). Maybe in a higher dimension, some important force could just happen to come out as inverse-square? That is, stronger than the "default" inverse (D-1)th power law, but weaker than those laws leading to actual confinement (like constant or direct-linear). "Atoms" (and molecules, including the big molecules we call planets!) built using that force would then have a stable discrete spectrum of negative (bound) energy levels. Chemistry, and so potentially life and SASs, could exist after all!

Now, maybe gravity, in the Newtonian limit when it can be thought of as a force at all, has to come out as the inverse (D-1)th law. (Is that right in classical GR, for D>=3 anyway? I don't know offhand.) But even that need not be a problem. First of all, gravity might always be weaker than some other force(s) of the non-"default" sort I discussed above, so that that cosmos would simply not be ruled by gravity. But even if inverse (D-1)th law gravity was dominant on "solar system and above" scales, so that there were no stable planetary orbits around a central star, all hope is not lost. As one purely illustrative scenario: Take a cosmos with plenty of "stars", each stably bound because the details of the "intra-star structure" happen to involve a broad spectrum of the available forces - not just gravity but perhaps the mutual attraction of "charged" or "colored" layers or patches or whatever, some of the forces of which are inverse square or otherwise suitable for stability. (This is something like the way in our cosmos, a solid rock can spin faster than its own equatorial orbital velocity. It stays together because messy non-gravitational forces - chemistry, adhesion, mechanical cohesion - dominate gravity on its scale.) Now, beyond those stars with their stable intra-star structure, out into the scale where only gravity is important, there are indeed no stable planetary orbits. But suppose there are enough stars, or that sufficiently many of them burn brightly enough, that general interstellar space is at a comfortable ambient temperature for life. (Not an outrageous assumption! Even in our cosmos, the ambient temperature in big globular clusters or galactic centers can be tens of Kelvin.) Well then, a planet that simply wanders around in interstellar space could evolve sunlight-driven life. Eventually by bad luck it would crash into a star, but of course this is irrelevant if the timescale for this happens to be longer than that for Darwinian evolution.

Another route. Who needs stars?! (I think you are sympathetic to this question, as you hint in a few places, e.g. "it is unclear how necessary this is for SASs" on page 15.) Imagine the messy non-gravitational forces in our D-space cosmos do include suitable ones to let planetary bodies condense from a cloud and stay in one piece, but don't include decent analogues of our star-powering processes (nuclear fusion, gravitational contraction). Even then, all hope is not lost: so long as we have chemistry, we can have things like the earth's hydrothermal vents on these planets, and depending on the precise details of a planet's heat of formation, heat conductivity, radioactive element load and the like, the surface (or a layer below the surface) might be held at "room temperature" (SAS-supporting chemistry temperature) for a very long time.

So basically, I am much more cheerful about the habitability of higher-dimensional spaces than you are in your paper. Furthermore, as I said in my stuff about Bayesian inference above, I believe that, if this cheerfulness proves well-founded, it is not really a problem for the category 1a TOE - your pessimism in that scenario is misplaced. We can have our cake and eat it here!

About our "local island", and other islands:

On page 15 you mention the possibility that the values of the coupling constants could turn out to be a solution to an "overdetermined" problem (more equations than unknowns). I think your own preference (mine too, incidentally) is for the alternative interpretation you mention immediately after that: that some of the supposed constraints are not really necessary for SASs, and our island is one of several in the "archipelago".

However, let's play devil's advocate, and suppose the coupling constants really are the solution to more equations than unknowns. What would the philosophical implications be? You suggest the following: "This could be taken as support for a religion-based category 2 TOE, with the argument that it would be unlikely in all other TOEs." I've a feeling you really just suggest this for completeness, but let's think about it anyway. I believe anyone who argued for a category 2 religion in this style would be quite wrong to do so, even in the dramatic case I'm considering (the "more equations than unknowns" case). Here's why.

Basically, my argument is as follows. The category 2 religion would presumably center around a "design argument". You know the sort of thing: "A god must have designed things so that those equations have the beautiful mutual intersection in one tiny island that they do." But what does this mean? How can a god "design" such an eventuality? If such an eventuality is true, it is a (very complicated) theorem of pure mathematics. Its logical status is the same as the theorem that the first 20 decimal digits of pi are 3.1415926535897932384. It is, ultimately, a tautology, albeit one not obvious to the intuition (at least, to human intuition, with its short "run-time"). That is, it's simply not open to design!

In fact let me go further. Imagine the opposite eventuality is true. That is, imagine there are more SAS-feasibility equations than unknowns in the general class of QFT-type theories with coupling constants, but that there isn't a beautiful mutual intersection - the graphs of the (in)equations go all over the place, and their intersection is empty. Now, speaking metaphorically, say you're an "SAS-loving god". You want to create an SAS-rich cosmos. This is your burning desire. It's Wednesday in heaven, and you've just had a gruelling Tuesday exploring the possibilities of broadly "classical" theories. You're extremely annoyed at the way that every simulation you tried on your heavenly Cray-1 kept running into problems. There was an ultraviolet catastrophe, or a "chaos catastrophe" (where classical atoms are never quite the same as one another, leading to no useful reproducible chemistry), or one damn thing after another. You're beginning to wonder whether you're going to be able to create an SAS-rich cosmos at all! But then a friendly angel says "Don't panic, try QFT-type theories instead."

Well! You fire up your heavenly Cray-1 again, and plod through the various coupling constants. To your chagrin, it's another wasted day in heaven. Another case of one damn thing after another! If the strong force is right for stellar lifetimes it's wrong for variety-of-elements; if the weak force is right for radioactive heat load inside planets it's wrong for stellar stability; and so on, ad nauseum. By Wednesday evening you're feeling like smashing your heavenly Cray-1 into tiny little pieces. Are you ever going to strike lucky with SASs? Classical theories don't support them, and (we're assuming here...) QFT-type theories don't support them either.

One must assume that on Thursday or whenever, you finally discover a theory of a third class, different from both classical and "QFT-type", which does support SASs. Perhaps the regime in question is "non-mathematical". (Although as you can maybe guess, I'm going to argue that such a statement is meaningless.) Anyway, with a burst of heavenly joy you create the relevant cosmos, and the SASs live happily ever after. What can those SASs conclude about their world, and about you (the god)?

Well, those SASs proceed to do science in their world. They study the stars, measure their brightnesses and spectra and chart their life-cycles; they study the chemistry of the stuff they're made of; they build theories. They (like us humans) go through a long period of false starts and retractions. They invent classical mechanics; they discover its faults; they invent QFT; they (this is important!) discover its faults. Because, of course, it is faulty! All choices of its coupling constants (by hypothesis) fail to give stars in the sky and SASs on the ground. So they have to move on. Maybe they eventually discover the very regime that you (the god) discovered "on Thursday". At any rate, whatever they discover, their process of discovery is that of seeking and studying the examples of regularity of structure in their world. That is, of doing "applied mathematics". And probably also of deliberately running ahead and studying the space of logically possible structures - doing "pure mathematics" - with the vague hope that some of those structures might match up with the stars in the sky and the SASs on the ground - that "pure mathematics" might unexpectedly become "applied".

The punchline to all this? Simply that, like the person who discovered that all their life they'd been speaking prose, those SASs can't not do mathematics! Mathematics is not separable from the rest of knowledge and wisdom. Similarly, when you (the god) whooped with heavenly joy that your new Thursday regime had enough regularity of structure (and of the right kind) to support SASs, you were doing mathematics. As Pauli said of a theory in some other context, the problem with a purported category 2 TOE is not that it's wrong, but that it's not even wrong! It just doesn't mean anything to say of a TOE that it's not mathematical. It's like saying it's not written in prose.

The original case, where (again) there are more SAS-feasibility equations than unknowns, and (we change back to...) they do have a beautiful mutual intersection, has an even more straightforward route to the same punchline. In the metaphor I'm using, on Wednesday in this scenario you do find the beautiful mutual intersection. (You really do find it - you don't design it! Of course this opens up the old controversy about whether mathematics is discovered or invented. I hope you'll agree that for the purposes of telling the story in my chosen metaphorical language, I have to at least speak "as if" mathematics is discovered.) With a sigh of relief you create the cosmos specified by that intersection, and go and have a well-earned rest. The SASs in that world also discover the mutual intersection that their lives depend on. (At least if it's computationally simple and tractable enough.) They gasp at its beauty, but nevertheless they'd be utterly wrong to conclude that this is evidence for a "category 2 TOE". For again, where exactly is the "non-mathematical" aspect to all this? It's precisely a fact of mathematics - the existence of that intersection in the abstract space of possible mathematical structures - that permits those SASs' very existence!

Apologies for the rather rambling and metaphorical style of argument above. I think you make essentially the same criticism of the concept of a category 2 TOE in section V F 2 (page 25: "The generality of mathematics"), without the rambling style and without the dodgy theistic metaphor!

And finally...

As promised, here's my (awfully tentative) proposal for how to re-interpret "uniform" either on the whole Figure 1 space of mathematical structures, or on the more limited "parameter space" within any one of them. Take first a parameter space. As an example, I'll use the same one as I used when I argued you were unduly pessimistic about the usefulness of a category 1a TOE if it turns out dimensions higher than 3 are all habitable. Namely, we're fixed inside some "box" of Figure 1 (e.g. a box called "QFT-like theories in any dimension"), and we're considering how to run over the parameter space within that box (the integer number of dimensions - leaving aside coupling constants [not to mention number of coupling constants!] in the interests of manageability).

My proposal: don't try to use a literally uniform prior over the integers. That leads to the false pessimism I outlined before. Instead, consider the Kolmogorov-Chaitin algorithmic complexity (in bits) of each potential parameter value (i.e. each integer). This is the length of the shortest program that prints the integer and nothing else and halts. - OK, this does depend on fixing a particular universal Turing machine, and a particular notation for output; but Chaitin and others keep reassuring us that all the "important" or "beautiful" results of algorithmic information theory are invariant modulo such choices. I'll accept their reassurances for the sake of argument.

Define the weighting for each integer to be 1 over (2 to the power of its complexity). This is the probability that the integer (and nothing else) would be emitted by a randomly chosen program for our fixed universal Turing machine. (Actually that's not quite true. If we wanted to make it true by definition we'd want to add up "1 over 2 to the power of the length of..." every program that prints the integer and nothing else and halts. We might also want to somehow "smudge this" over the set of different original choices for the fixed universal Turing machine - if this would help reduce embarrassing dependence on a particular one, assuming this did turn out to be a problem after all.)

What are the properties of this weighting? First of all, it's a bona fide probability distribution in its own right: it adds up to a finite number. (In fact to Chaitin's Omega! - Or less, if our fixed notation is such that some outputs are regarded as not syntactically valid integers and hence rejected.) Thus it can be normalized to add up to 1. Secondly, small integers (and a tiny handful of large integers, namely those like 10^(10^(10^(10^(10^(10^(10^(10^(10^10)))))))) with a highly regular structure, i.e. a simple algorithm for producing them) have large weightings. Thus if all space dimensions are habitable but we live in dimension 3, this is not an embarrassment. The integer 3 has a large weighting all to itself!

Turning now to the problem of painting the whole of Figure 1 with a Bayesian prior: well, the same general idea applies. To be precise: We consider the probability that a randomly chosen program would produce (a formal specification or axiomatization for) a particular mathematical structure and then halt. This "punishes" those structures that are particularly complicated and ad hoc in nature. Only long (and so "unlikely") programs would specify such structures!

A pleasing aspect of this proposal is it renders harmless the subjective decision "one box or many?" one keeps meeting when drawing a diagram such as Figure 1. For example, do we have a box "(3+1)-dimensional QFT", another box "(4+1)-dimensional QFT", and so on? Or do we have one box "QFT-like theories in any dimension", with an internal parameter D giving the number of dimensions? It doesn't really matter. In both cases, to output the specification of (say) a 4384792698273462-dimensional field theory "costs" the same. Either we have one long program to do it "in one go", or we have a short subroutine specifying the axioms for "QFT-like theories in any dimension" and then a subroutine specifying the act of plugging in the magic number 4384792698273462. You can find in Chaitin's work various sorts of "covering theorems" that guarantee that the precise value of complexity assigned to a given digital object (output text) is not much changed by such "notational disputes". - I'm still slightly nervous that things are maybe not as clean and straightforward for this application as Chaitin promises, but there's at least the hope it'll all work out OK!

Note that to specify a category of structure, then a sub-category within it, then a sub-sub-category within that, etc, one "usually" (i.e. unless the "zooming rules" themselves have algorithmically compressible structure) just has to give a program whose length is near enough the sum of the lengths of the specifying programs at each stage. Under the "1 over 2 to the power of..." rule, the weighting then becomes near enough the product of the "local weightings" assigned at each level of zooming in. Thus the general look and feel, as it were, of a decently-behaved probability distribution is rendered successfully.

So there you are! What do you think? The trouble is, only researchers in our far future (if ever) will be capable of seriously summing and integrating over substantial swathes of Figure 1 in the manner of your section III, equations (4)-(7). Thus only they will really discover the truth about any purported good and bad features of proposed weightings. For now we're really all just guessing! But I thought you might find these thoughts interesting, and who knows, perhaps suggestive of something better.

Related work by others

Chris Maloney points out that my suggestion of a prior based on Kolmogorov-Chaitin algorithmic complexity is similar to that discussed in Jürgen Schmidhuber's "great programmer religion" paper. Jürgen continues to work in this area: see for example his paper on algorithmic theories of everything (quant-ph/0011122).

This corner of the web maintained by Iain Stewart <ids@doc.ic.ac.uk>, Department of Computing, Imperial College, London, UK

- add your own comments to this web page or any other for all to see! (Only supported by some browsers, sometimes via an extension or the like.)

(If you're reading this from within IC DoC you can try Crit, an earlier way to annotate the web. )