*
(Adapted from
my message of 28th September 1997
to
Max Tegmark
giving my thoughts on his ultimate ensemble theory paper.)
*

Hello Max... I came across your paper
"Is `the theory of everything' merely the ultimate ensemble theory?"
(gr-qc/9704009)
on the web.
I found it by a rather circuitous route. For several consecutive years now,
here at Imperial College Department of Computing, I've been supervising an
undergraduate individual project on quantum computing
(with Peter Knight of our quantum optics section
agreeing to act as an occasional informal "extra marker"
to patch over the fact that the whole thing's not really my field!).
When looking through the recent papers in the
quant-ph
section of the Los Alamos archive,
just to try and keep up with things as best I can,
I came across a link to your website, and thence to your
ultimate ensemble theory paper.
(Since then, of course, your paper's been covered by
*New Scientist*
in their
"anything goes" article of 6th June 1998,
and become much more widely known.)

Well, it was a joy to read, and I'm glad someone's "thinking big" in this
way! Have you come across John Leslie's semi-popular book *Universes*?
It deals with many of the same themes - although the later parts drift into
a sort of religious apology, which I don't find particularly useful.
Anyway, I have some thoughts on various aspects of your paper,
which I hope you might find interesting or thought-provoking. So here goes!

I think the biggest issue here is choice of an "ultimate prior". If I've understood things right you're proposing the "uniform prior" in all parameter spaces: but is this really well-defined? (For example, uniform, log-uniform, or uniform in the "arctan-log" mapping you use for convenience in your graphs, will all give subtly different conclusions [posteriors] in any given application of the Bayesian approach.)

Stepping one level further out so to speak, there's also the inevitable
problem of how to be "uniform" in the space of mathematical structures
you give a sample of in your Figure 1. The diagram is by necessity very
heterogeneous - after all any sub-parts of it that form ordered chains
(like "double fields", "triple fields", ...) either get cut off due to
inconsistency, or else, if that doesn't happen, we'd probably choose to have
drawn the diagram differently
(e.g. having one box called "generalized fields" with an implicit
internal parameter *n* giving the order "double", "triple" etc).
Casting a bird's-eye view over the whole diagram, one shudders at
what it could even mean to paint the diagram with a "uniform"
ultimate prior!

I do in fact have a provisional suggestion about this. It's at least vaguely related to notions of Kolmogorov-Chaitin algorithmic complexity. However I'll leave it till further on, when it'll make more sense (I hope!).

Meanwhile, here's a dramatic instance of the problems an over-simple
concept of "uniform" might cause. In your footnote 13 (page 24), you're
discussing your earlier pessimism about the whole ultimate ensemble theory
project, when you thought all integer space dimensionalities
(or 3 upwards anyway) were likely to be habitable. But: imagine this had
been true, i.e. all dimensionalities >=3 are habitable.
(It might even *be* true! One can think of reasons why not all forces
would necessarily be "inverse-(*D*-1)th law"
and thereby lead to the stability problems you discuss.
But more of that later.)
What would the consequences be for your whole "category 1a TOE"
philosophical project? I believe it's wrong to say they would have been
the pessimistic consequences you feared. That is, it's wrong to say that
when we opened our eyes and counted our space dimensions and got 3,
we'd thereby falsify the whole category 1a TOE project.
For consider this: if the category 1a TOE is true, and contains SAS-inhabited
worlds of all dimensions (3,4,5,...),
then **every SAS** opens its eyes,
counts its space dimensions, and (falsely and pessimistically) concludes
the category 1a TOE is falsified!!! For example, creatures who count their
space dimensions and get 4384792698273462 would, if they took a
"uniform" prior over the integers seriously, have to go on to say:
"Gosh! 4384792698273462 is such a *tiny* number, out of all the integers!
Gee, I guess that falisifies the category 1a TOE."
You see the problem? The use of a "uniform" prior over the integers
would condemn every one of those SASs to reaching a false conclusion.
(Slogan: All integers are tiny! Or: All integers are untypically small!)

A further, more technical problem is that in probability theory as
currently practiced (with its demand that probabilities can be normalized
to add up to 1), a uniform prior over the integers or other countable
discrete sets doesn't even exist. However, I think one could argue that
*this* problem is a lesser one
- maybe "God's probabilities" don't really
have to be normalized to add up to 1! A strange thought.

(Note that a uniform prior over the *reals*,
or **R**^3 or **R**^4 or whatever,
strictly speaking doesn't make sense in today's probability theory either!
The reason this doesn't cause a crisis in Bayesian probability theory
is simply that usually the very first observation in an experiment
immediately modifies this dodgy prior to a normalizable distribution.
But the case above [SASs of all space dimensions opening their eyes and
counting] is uncomfortably different in style.)

I'll give my own musings on how we might get round these problems of interpreting uniformity later on.

Gregory Chaitin's work on algorithmic (Kolmogorov-Chaitin) complexity provides an intriguing counterexample to your Figure 2. (I realize Figure 2 is just a qualitative graph, and also not particularly pivotal to the paper as a whole. But you might still be interested in my counterexample. It touches on deep issues of how to think about axiomatizations, models, and the like.)

Have you heard of Chaitin's constant capital-Omega? You have to fix on a
particular universal Turing machine, but once you've done that, Omega is
a perfectly definite constant, just like *pi*.
Unlike *pi*, it's uncomputable.
Worse: it's "maximally" uncomputable.
That is, given the first *n* binary digits as an oracle,
you'd *still* have no way (no Turing algorithm) of guessing
the subsequent digits with a success rate different from 50%.
Omega is "(incompressibly) random" in the jargon of algorithmic information
theory - although it's not, of course, "random" in the broader sense of
capricious or underdetermined: it's just as determined (once your favorite
universal Turing machine has been fixed) as *pi* or *e*.

Omega is defined as the probability that the universal Turing machine will halt, given a semi-infinite input tape of bits chosen randomly by independent coin tossings. - I hasten to add that the definition can be given without using words like probability. I'm just choosing to give the definition as intuitively clearly as possible.

Omega has the fascinating property that there is a simple Turing algorithm
(though with a ridiculously long run-time) which, if given the first *n*
binary digits of Omega as an oracle,
can then *solve the halting problem*(!)
for "generic Turing machines" (i.e. programs-with-embedded-input
for our fixed universal Turing machine to run)
of length up to and including *n*.
(Taking the limit: if the whole of Omega is available as an oracle,
the halting problem for Turing machines is solvable. However, the halting
problem for *Omega-oracle-enhanced* Turing machines
is still *not* solvable,
for the usual *reductio ad absurdum* reasons.)

Since Omega is incompressibly "random", any particular consistent formal system
(say ZFC, or ZFC plus your favorite axioms of large cardinals) cannot
give a correct ruling on more than a finite number of digits of Omega.
Thus, one can add new axioms:
"**Axiom 729.** The 729th digit of Omega is 1",
and so forth - in such a way that one is genuinely strengthening the
formal system's complexity (and power) at each point - in the full, strict
sense that the *algorithmic complexity* of the set of axioms is rising
- and yet the system remains consistent. Thus, the "cliff edge" of Figure 2
is not compulsory!

There's something disturbing about all this.
It means that *mathematical reality*
can pick out certain structures as "special" (what you might call
low "definitional" complexity), like Chaitin's number Omega, but
*formal systems* cannot attain the state of handling these structures
competently and flawlessly (any particular formal system will have either
incomplete or wrong embedded knowledge of the structure).
Now of course we knew this all along because of Gödel's theorem,
but Chaitin's work puts a new, embarrassing quantitative spin on all this.
That is, after Gödel's theorem mathematicians could shrug and say
"Well, who really cares that a few highly esoteric propositions, of the
form only metamathematicians ever use, are undecidable?" But the digits of
Chaitin's Omega are not of that form. They act as oracles for an intensely
practical problem (the halting problem - which, if one is doing physics by
simulation, is the question "does this system reach a stable state?"
or the like), and there are aleph-null of them, and we can only know
finitely many (0%) of them!

We might want to "loosen up" Figure 1 (the giant structure diagram), in the sense that formal systems are perhaps being awarded an unfairly pivotal role in the diagram - mathematical reality can be construed to go beyond the collection of all formal systems. On the other hand, I suppose this is the issue (the nature of mathematics; Platonism, formalism, etc) that no-one can quite agree on, or even decide how to argue about! I'm guessing that you were not necessarily saying to the world "Ich bin ein Formalist" by designing Figure 1 the way you did. Rather, you were "playing safe" by sticking to the structures that mathematicians of all persuasions would agree existed, or could be talked about as if they existed: (structures described by) formal systems. Have I guessed right?

Here's a way space dimensions greater than 3 could turn out to be
inhabited after all, and even by "particle-based" SASs
(rather than string-based or membrane-based or whatever).
The idea is we don't have to take the
"inverse (*D*-1)th law" as a compulsory
force law in dimension *D*. It's true you get that when you use massless,
non-self-coupling exchange bosons in (*D*+1)-dimensional QFT.
But maybe the important (most macroscopically obvious) forces will *not*
be those mediated by massless non-self-coupling bosons!
For example, in (3+1)-dimensional QCD, the color force between quarks
is something like constant, or direct-linear (anyway, something that leads
to confinement). Maybe in a higher dimension, some important force could
just happen to come out as inverse-square?
That is, stronger than the "default" inverse (*D*-1)th power law,
but weaker than those laws leading to actual confinement (like constant or
direct-linear). "Atoms" (and molecules, including the big molecules we call
planets!) built using that force would then have a stable
discrete spectrum of negative (bound) energy levels. Chemistry, and so
potentially life and SASs, could exist after all!

Now, maybe *gravity*, in the Newtonian limit when it can be thought of as
a force at all, *has* to come out as the inverse (*D*-1)th law.
(Is that right in classical GR, for *D*>=3 anyway?
I don't know offhand.)
But even that need not be a problem. First of all, gravity might always be
weaker than some other force(s) of the non-"default" sort I discussed above,
so that that cosmos would simply not be ruled by gravity.
But even if inverse (*D*-1)th law gravity was dominant on
"solar system and above" scales, so that there were no stable planetary orbits
around a central star, all hope is not lost.
As one purely illustrative scenario:
Take a cosmos with plenty of "stars", each stably bound because the details
of the "intra-star structure" happen to involve a broad spectrum of the
available forces - not just gravity but perhaps the mutual attraction of
"charged" or "colored" layers or patches or whatever,
some of the forces of which are
inverse square or otherwise suitable for stability. (This is something like
the way in our cosmos, a solid rock can spin faster than its own
equatorial orbital velocity. It stays together because messy non-gravitational
forces - chemistry, adhesion, mechanical cohesion - dominate gravity on
its scale.)
Now, beyond those stars with their stable intra-star structure,
out into the scale where only gravity is important, there are indeed no
stable planetary orbits. But suppose there are enough stars, or that
sufficiently many of them burn brightly enough, that
general interstellar space is at a comfortable ambient temperature for life.
(Not an outrageous assumption! Even in our cosmos, the ambient temperature
in big globular clusters or galactic centers can be tens of Kelvin.)
Well then, a planet that simply wanders around in interstellar space
could evolve sunlight-driven life.
Eventually by bad luck it would crash into a star,
but of course this is irrelevant if the timescale for this happens to be longer
than that for Darwinian evolution.

Another route. Who needs stars?! (I think you are sympathetic to this
question, as you hint in a few places, e.g.
"it is unclear how necessary this is for SASs" on page 15.)
Imagine the messy non-gravitational forces in our *D*-space cosmos
*do* include suitable ones to let planetary bodies condense from a cloud
and stay in one piece, but *don't* include decent analogues of our
star-powering processes (nuclear fusion, gravitational contraction).
Even then, all hope is not lost: so long as we
have chemistry, we can have things like the earth's hydrothermal vents
on these planets, and depending on the precise details of a planet's heat
of formation, heat conductivity, radioactive element load and the like,
the surface (or a layer below the surface) might be held at
"room temperature" (SAS-supporting chemistry temperature) for a very long time.

So basically, I am much more cheerful about the habitability of
higher-dimensional spaces than you are in your paper.
Furthermore, as I said in my stuff about Bayesian inference above,
I believe that, if this cheerfulness proves well-founded, it is not really
a problem for the category 1a TOE - your pessimism in that scenario is
misplaced. We *can* have our cake and eat it here!

On page 15 you mention the possibility that the values of the coupling constants could turn out to be a solution to an "overdetermined" problem (more equations than unknowns). I think your own preference (mine too, incidentally) is for the alternative interpretation you mention immediately after that: that some of the supposed constraints are not really necessary for SASs, and our island is one of several in the "archipelago".

However, let's play devil's advocate, and suppose the coupling constants
really are the solution to more equations than unknowns.
What would the philosophical implications be?
You suggest the following: "This could be taken as support for a religion-based
category 2 TOE, with the argument that it would be unlikely in all other TOEs."
I've a feeling you really just suggest this for completeness, but let's
think about it anyway.
I believe **anyone who argued for a category 2 religion in this style
would be quite wrong to do so**,
*even in the dramatic case I'm considering*
(the "more equations than unknowns" case). Here's why.

Basically, my argument is as follows. The category 2 religion would
presumably center around a "design argument". You know the sort of thing:
"A god must have designed things so that those equations have the beautiful
mutual intersection in one tiny island that they do."
But what does this mean? How can a god "design" such an eventuality?
If such an eventuality is true, it is a (very complicated) theorem of
*pure mathematics*.
Its logical status is the same as the theorem that the
first 20 decimal digits of *pi* are 3.1415926535897932384.
It is, ultimately, a tautology, albeit one not obvious to the intuition
(at least, to human intuition, with its short "run-time").
That is, it's simply **not open to design!**

In fact let me go further. Imagine the opposite eventuality is true.
That is, imagine there *are* more SAS-feasibility equations than unknowns
in the general class of QFT-type theories with coupling constants, but that
there *isn't* a beautiful mutual intersection
- the graphs of the (in)equations
go all over the place, and their intersection is empty.
Now, speaking metaphorically, say you're an "SAS-loving god".
You want to create an SAS-rich cosmos. This is your burning desire.
It's Wednesday in heaven, and you've just had a gruelling Tuesday
exploring the possibilities of broadly "classical" theories.
You're extremely annoyed at the way that
every simulation you tried on your heavenly Cray-1 kept running into
problems. There was an ultraviolet catastrophe, or a "chaos catastrophe"
(where classical atoms are never quite the same as one another, leading to
no useful reproducible chemistry), or one damn thing after another.
You're beginning to wonder whether you're going to be able to create an
SAS-rich cosmos at all! But then a friendly angel says "Don't panic,
try QFT-type theories instead."

Well! You fire up your heavenly Cray-1 again, and plod
through the various coupling constants.
To your chagrin, it's *another* wasted day in heaven.
Another case of one damn thing after another! If the strong force is right
for stellar lifetimes it's wrong for variety-of-elements; if the weak force
is right for radioactive heat load inside planets it's wrong for
stellar stability; and so on, ad nauseum. By Wednesday evening you're
feeling like smashing your heavenly Cray-1 into tiny little pieces.
Are you *ever* going to strike lucky with SASs? Classical theories don't
support them, and (we're assuming here...) QFT-type theories don't support them
either.

One must assume that on Thursday or whenever, you finally discover a theory of
a third class, different from both classical and "QFT-type",
which *does* support SASs. Perhaps the regime in question is
"non-mathematical". (Although as you can maybe guess, I'm going to argue
that such a statement is meaningless.) Anyway, with a burst of heavenly joy
you create the relevant cosmos, and the SASs live happily ever after.
What can those SASs conclude about their world, and about you (the god)?

Well, those SASs proceed to do science in their world. They study the
stars, measure their brightnesses and spectra and chart their life-cycles;
they study the chemistry of the stuff they're made of; they build theories.
They (like us humans) go through a long period of false starts and
retractions. They invent classical mechanics; they discover its faults;
they invent QFT; they (this is important!) discover *its* faults.
Because, of course, it *is* faulty! All choices of its coupling constants
(by hypothesis) fail to give stars in the sky and SASs on the ground.
So they have to move on.
Maybe they eventually discover the very regime that you (the god)
discovered "on Thursday". At any rate, whatever they discover, their
*process* of discovery is that of seeking and studying
the examples of regularity of structure in their
world. That is, of doing "applied mathematics". And probably also of
deliberately running ahead and studying the space of logically possible
structures - doing "pure mathematics" - with the vague hope that some of
those structures might match up with the stars in the sky and the SASs on the
ground - that "pure mathematics" might unexpectedly become "applied".

The punchline to all this? Simply that, like the person who discovered that
all their life they'd been speaking prose, those SASs can't *not*
do mathematics! Mathematics is not separable from the rest of knowledge and
wisdom. Similarly, when you (the god) whooped with heavenly joy that your new
Thursday regime had enough regularity of structure (and of the right kind)
to support SASs, *you* were doing mathematics. As Pauli said of a theory
in some other context, the problem with a purported category 2 TOE is not
that it's wrong, but that it's not even wrong!
It just doesn't *mean* anything
to say of a TOE that it's not mathematical. It's like saying it's not
written in prose.

The original case, where (again) there are more SAS-feasibility equations than
unknowns, and (we change back to...)
they *do* have a beautiful mutual intersection, has an even
more straightforward route to the same punchline. In the metaphor I'm using,
on Wednesday in this scenario you do find the beautiful mutual intersection.
(You really do *find* it
- you don't design it! Of course this opens up the
old controversy about whether mathematics is discovered or invented.
I hope you'll agree that for the purposes of telling the story in my chosen
metaphorical language, I have to at least speak "as if"
mathematics is discovered.)
With a sigh of relief you create the cosmos specified by that intersection,
and go and have a well-earned rest. The SASs in that world also discover
the mutual intersection that their lives depend on.
(At least if it's computationally simple and tractable enough.)
They gasp at its beauty, but nevertheless they'd be utterly wrong to
conclude that this is evidence for a "category 2 TOE". For again,
where exactly is the "non-mathematical" aspect to all this?
It's precisely a fact of mathematics - the existence of that intersection
in the abstract space of possible mathematical structures - that permits
those SASs' very existence!

Apologies for the rather rambling and metaphorical style of argument above. I think you make essentially the same criticism of the concept of a category 2 TOE in section V F 2 (page 25: "The generality of mathematics"), without the rambling style and without the dodgy theistic metaphor!

As promised, here's my (awfully tentative) proposal for how to re-interpret
"uniform" either on the whole Figure 1 space of mathematical structures,
or on the more limited "parameter space" within any one of them.
Take first a parameter space. As an example, I'll use the same one as I used
when I argued you were unduly pessimistic about the usefulness of a
category 1a TOE if it turns out dimensions higher than 3 are all habitable.
Namely, we're fixed inside some "box" of Figure 1 (e.g. a box called
"QFT-like theories in any dimension"), and we're considering how to run over
the parameter space within that box
(the integer number of dimensions - leaving aside coupling constants
[not to mention *number* of coupling constants!]
in the interests of manageability).

My proposal: don't try to use a literally uniform prior over the integers.
That leads to the false pessimism I outlined before. Instead, consider
the **Kolmogorov-Chaitin algorithmic complexity** (in bits)
of each potential parameter value (i.e. each integer).
This is the length of the shortest program that
prints the integer and nothing else and halts.
- OK, this does depend on fixing a particular universal Turing machine,
and a particular notation for output; but Chaitin and others keep reassuring
us that all the "important" or "beautiful" results of algorithmic
information theory are invariant modulo such choices. I'll accept their
reassurances for the sake of argument.

Define the weighting for each integer to be
1 over (2 to the power of its complexity). This is the probability that the
integer (and nothing else) would be emitted by a randomly chosen
program for our fixed universal Turing machine.
(Actually that's not quite true. If we wanted to make it true by definition
we'd want to add up "1 over 2 to the power of the length of..."
*every* program that prints the integer and nothing else and halts.
We might also want to somehow "smudge this" over
the set of different original choices for the
fixed universal Turing machine - if this would help reduce embarrassing
dependence on a particular one, assuming this did turn out to be a problem
after all.)

What are the properties of this weighting? First of all, it's a bona fide
probability distribution in its own right: it adds up to a finite number.
(In fact to Chaitin's Omega! - Or less, if our fixed notation is such that
some outputs are regarded as not syntactically valid integers and hence
rejected.) Thus it can be normalized to add up to 1.
Secondly, *small* integers
(and a tiny handful of large integers, namely those
like 10^(10^(10^(10^(10^(10^(10^(10^(10^10)))))))) with a highly regular
structure, i.e. a simple algorithm for producing them)
have *large* weightings. Thus if all space dimensions are habitable
but we live in dimension 3, this is not an embarrassment.
The integer 3 has a large weighting all to itself!

Turning now to the problem of painting the whole of Figure 1 with a Bayesian prior: well, the same general idea applies. To be precise: We consider the probability that a randomly chosen program would produce (a formal specification or axiomatization for) a particular mathematical structure and then halt. This "punishes" those structures that are particularly complicated and ad hoc in nature. Only long (and so "unlikely") programs would specify such structures!

A pleasing aspect of this proposal is it renders harmless the subjective
decision "one box or many?" one keeps meeting when drawing a diagram such as
Figure 1. For example, do we have a box "(3+1)-dimensional QFT",
another box "(4+1)-dimensional QFT", and so on?
Or do we have *one* box "QFT-like theories in any dimension",
with an internal parameter *D* giving the number of dimensions?
It doesn't really matter. In both cases, to output the specification of (say) a
4384792698273462-dimensional field theory "costs" the same.
Either we have one long program to do it "in one go", or we have a
short subroutine specifying the axioms for "QFT-like theories in any dimension"
and then a subroutine specifying the act of plugging in the magic number
4384792698273462. You can find in Chaitin's work various sorts of
"covering theorems" that guarantee that the precise value of complexity
assigned to a given digital object (output text) is not much changed
by such "notational disputes". - I'm still slightly nervous that things are
maybe not as clean and straightforward for *this* application as
Chaitin promises, but there's at least the hope it'll all work out OK!

Note that to specify a category of structure, then a sub-category within it,
then a sub-sub-category within that, etc, one "usually"
(i.e. unless the "zooming rules" themselves have algorithmically compressible
structure) just has to give a program whose length
is near enough the *sum*
of the lengths of the specifying programs at each stage. Under the
"1 over 2 to the power of..." rule, the weighting then becomes near enough
the *product* of the "local weightings"
assigned at each level of zooming in.
Thus the general look and feel, as it were, of a decently-behaved probability
distribution is rendered successfully.

So there you are! What do you think? The trouble is, only researchers in our far future (if ever) will be capable of seriously summing and integrating over substantial swathes of Figure 1 in the manner of your section III, equations (4)-(7). Thus only they will really discover the truth about any purported good and bad features of proposed weightings. For now we're really all just guessing! But I thought you might find these thoughts interesting, and who knows, perhaps suggestive of something better.

Chris Maloney points out that my suggestion of a prior based on Kolmogorov-Chaitin algorithmic complexity is similar to that discussed in Jürgen Schmidhuber's "great programmer religion" paper. Jürgen continues to work in this area: see for example his paper on algorithmic theories of everything (quant-ph/0011122).

*This corner of the web maintained by
Iain Stewart
<ids@doc.ic.ac.uk>,
Department of Computing,
Imperial College,
London, UK
*

*
- add your own comments to this web page or any other for all to see!
(Only supported by some browsers, sometimes via an extension or the like.)
*

*
(If you're reading this from within IC DoC you can try Crit,
an earlier way to annotate the web.
)
*