Long Blurb: In the past year, I read three books by Gregory Chaitin: His magnum opus – MetaMath! (which had been on my reading stack since 2007), a little new book – Proving Darwin and most of Algorithmic Information Theory (available as a PDF as well). Before this I had only read articles and essays by him. I have been sitting on multiple blog post drafts based on these readings but somehow I never got to finishing them up, perhaps it is time to publish them as is! Chaitin is an engaging and provocative writer (and speaker! see this fantastic lecture for instance) who really makes me think. It has come to my notice that a lot of people find his writing style uncomfortable (for apparently lacking humility and having copious hyperbole). While there might be some merits to this view, if one is able to look beyond these minor misgivings he always has a lot of interesting ideas. The apparent hyperbole is best viewed as effusive enthusiasm to get the most out of his writings (since there is indeed a lot to be mined). I might talk about this when I actually write about his books and the ideas therein. For now, this post was in part inspired by something I read in the second book: Proving Darwin.
I thought this was a thought provoking book, but perhaps was not suited to be a book, yet. It builds on the idea of DNA as software and tries to think of Evolution as a random walk in software space. The idea of DNA as software is not new. If one considers DNA to be software and the other biological processes to also be digital, then one is essentially sweeping out all that might not be digital. This view of biological processes might thus be at best an imprecise metaphor. But as Chaitin quotes Picasso ‘art is a lie that helps us see the truth‘ and explains that this is just a model and the point is to see if anything mathematically interesting can be extracted from such a model.
He eloquently points out that once DNA has been thought of as a giant codebase, then we begin to see that a human DNA is just a major software patchwork, having ancient subroutines common with fish, sponges, amphibians etc. As is the case with large software projects, the old code can never be thrown away and has to be reused – patched and made to work and grows by the principle of least effort (thus when someone catches ill and complains of a certain body part being badly designed it is just a case of bad software engineering!). The “growth by accretion” of this codebase perhaps explains Enrst Haeckel‘s slogan “ontogeny recapitulates phylogeny” which roughly means that the growth of a human embryo resembles the successive stages of evolution (thus biology is just a kind of Software Archeology).
All this sounds like a good analogy. But is there any way to formalize this vaguiesh notion of evolution as a random walk in software space and prove theorems about it? In other words, does a theory as elegant as Darwinian Evolution have a purely mathematical core? it must says Chaitin and since the core of Biology is basically information, he reckons that tools from Algorithmic Information Theory might give some answers. Since natural software are too messy, Chaitin considers artificial software in parallel to natural software and considers a contrived toy setting. He then claims in this setting one sees a mathematical proof that such a software evolves. I’ve read some criticisms of this by biologists on the blogosphere, however most criticise it on the premises that Chaitin explicitly states that his book is not about. There are however, some good critiques, I will write about these when I post my draft on the book.
Now coming to the title of the post. The following are some excerpts from the book:
“But we could not realize that the natural world is full of software, we could not see this, until we invented human computer programming languages […] Nobel Laureate Sydney Brenner shared an office with Francis Crick of Watson and Crick. Most Molecular Biologists of this generation credit Schrödinger’s book What is Life? (Cambridge University Press, 1944) for inspiring them. In his autobiography My Life in Science Brenner instead credits von Neumann’s work on self-reproducing automata.
[…] we present a revisionist history of the discovery of software and of the early days of molecular biology from the vantage point of Metabiology […] As Jorge Luis Borges points out, one creates one’s predecessors! […] infact the past is not only occasionally rewritten, it has to be rewritten in order to remain comprehensible to the present. […]
And now for von Neumann’s self reproducing automata. von Neumann, 1951, takes from Gödel the idea of having a description of the organism within the organism = instructions for constructing the organism = hereditary information = digital software = DNA. First you follow the instructions in the DNA to build a new copy of the organism, then you copy the DNA and insert it in the new organism, then you start the new organism running. No infinite regress, no homunculus in the sperm! […]
You see, after Watson and Crick discovered the molecular structure of DNA, the 4-base alphabet A, C, G, T, it still was not clear what was written in this 4-symbol alphabet, it wasn’t clear how DNA worked. But Crick, following Brenner and von Neumann, somewhere in the back of his mind had the idea of DNA as instructions, as software”
I did find this quite interesting, even if it sounds revisionist. That is because the power of a paradigm-shift conceptual leap is often understated, especially after more time has passed. The more time passes, the more “obvious” it becomes and hence the more “diffused” its impact. Consider the idea of a “computation”. A century ago there was no trace of any such idea, but once it was born born and diffused throughout we basically take it for granted. In fact, thinking of a lot of things (anything?) precisely is nearly impossible without the notion of a computation in my opinion. von Neumann often remarked that Turing’s 1936 paper contained in it both the idea of software and hardware and the actual computer was just an implementation of that idea. In the same spirit von Neumann’s ideas on Self-Reproducing automata have had a similar impact on the way people started thinking about natural software and replication and even artificial life (recall the famous von Neumann quote: Life is a process that can be abstracted away from any particular medium).
While I found this proposition quite interesting, I left this at that. Chaitin cited Brenner’s autobiography which I could not obtain since I hadn’t planned on reading it, google-books previews did not suffice either. So i didn’t pursue looking on what Brenner actually said.
However, this changed recently, I exchanged some emails with Maarten Fornerod, who happened to point out an interview of Sydney Brenner that actually talks about this.
“45: John von Neumann and the history of DNA and self-replication:
What influenced me the most was the articles of von Neumann. Now, I had become interested in von Neumann not through the coding issues but through his interest in the nervous system and computers and of course, that’s what Seymour was interested in because what we wanted to do is find out how the brain worked; that was what… you know, like a hobby on the side, you know. After… after dinner we’d work on central problems of the brain, you see, but it was trying to find out how this worked. And so I got this symposium, the Hixon Symposium, which I had got to read. I was, at that time, very fascinated by a rather strange complexion of psychology things by a man called Wolfgang Köhler who was a gestalt psychologist and another man called Lewin, who was what was called a topological psychologist and of course they were talking at a very different level, but in this book there is an article by Köhler and of course in that book there’s this very famous paper which no one has ever read of von Neumann. Now of course later I discovered that those ideas were much older than the dating of this and that people had taken notes of them and they had circulated. But what is the brilliant part of this paper is in fact his description of what it takes to make a self-reproducing machine. And in fact if you look at what he says and what Schrödinger says, you can see what I have come to call Schrödinger’s fundamental error, and in fact we can… I can find you the passage in that. But it’s an amazing passage, because what von Neumann shows is that you have to have a mechanism not only of copying the machine but of copying the information that specifies the machine, right, so that he then divided the machine – the… the automaton as he called it – into three components: the functional part of the automaton; a… a decoding section of this which is part of that, which actually takes the tape, reads the instructions and builds the automaton; and a device that takes a copy of this tape and inserts it into the new automaton, right, which is the essential… essential, fundamental… and when von Neumann said that is the logical basis of self-reproduction, then you can see where Schrödinger made his mistake and this can be summarised in one sentence. Schrödinger says the chromosomes contain the information to specify the future organism and the means to execute it and that’s not true. The chromosomes contain the information to specify the future organisation and a description of the means to implement, but not the means themselves, and that logical difference is made so crystal clear by von Neumann and that to me, was in fact… The first time now of course, I wasn’t smart enough to really see that this is what DNA is all about, and of course it is one of the ironies of this entire field that were you to write a history of ideas in the whole of DNA, simply from the documented information as it exists in the literature, that is a kind of Hegelian history of ideas, you would certainly say that Watson and Crick depended on von Neumann, because von Neumann essentially tells you how it’s done and then you just… DNA is just one of the implementations of this. But of course, none knew anything about the other, and so it’s a… it’s a great paradox to me that in fact this connection was not seen. Linus Pauling is present at that meeting because he gives this False Theory of Antibodies there. That means he heard von Neumann, must have known von Neumann, but he couldn’t put that together with DNA and of course… well, neither could Linus Pauling put his own paper together with his future work because he and Delbrück wrote a paper on self… on self-complementation – two pieces of information – in about 1949 and had forgotten it by the time he did DNA, so that of course leads to a really distrust about what all the historians of science say, especially those of the history of ideas. But I think that that in a way is part of our kind of revolution in thinking, namely the whole of the theory of computation, which I think biologists have yet to assimilate and yet is there and it’s a… it’s an amazingly paradoxical field. You know, most fields start by struggling through from, from experimental confusion through early theoretical, you know, self-delusion, finally to the great generality and this field starts the other way round. It starts with a total abstract generality, namely it starts with, with Gödel’s hypothesis or the Turing machine, and then it takes, you know, 50 years to descend into… into banality, you see. So it’s the field that goes the other way and that is again remarkable, you know, and they cross each other at about 1953, you know: von Neumann on the way down, Watson and Crick on the way up. It was never put together.
[Q] But… but you didn’t put it together either?
I didn’t put it together, but I did put together a little bit later that, because the moment I saw the DNA molecule, then I knew it. And you connected the two at once? I knew this.
46: Schrodinger Wrong, von Neumann right:
I think he made a fundamental error, and the fundamental error can be seen in his idea of what the chromosome contained. He says… in describing what he calls the code script, he says, ‘The chromosome structures are at the same time instrumental in bringing about the development they foreshadow. They are law code and executive power, or to use another simile, they are the architect’s plan and the builder’s craft in one.’ And in our modern parlance, we would say, ‘They not only contain the program but the means to execute the program’. And that is wrong, because they don’t contain the means; they only contain a description of the means to execute it. Now the person that got it right and got it right before DNA is von Neumann in developing the logic of self-reproducing automata which was based of course on Turing’s previous idea of automaton and he gives this description of this automaton which has one part that is the machine; this machine is built under the instructions of a code script, that is a program and of course there’s another part to the machine that actually has to copy the program and insert a copy in the new machine. So he very clearly distinguishes between the things that read the program and the program itself. In other words, the program has to build the machinery to execute the program and in fact he says it’s… when he tries to talk about the biological significance of this abstract theory, he says: ‘This automaton E has some further attractive sides, which I shall not go into at this time at any length’.
47: Schrodinger’s belief in calculating an organism from chromosomes:
One of the central things in the Schrödinger’s little book What Is Life is what he has to say about the chromosomes containing the program or the code, and he says here… on page 20, he says: ‘In calling the structure of the chromosome fibres a code script, we mean that the all penetrating mind once conceived by Laplace, to which every cause or connection lay immediately open, could tell from their structure whether the egg would develop under suitable conditions, into a black cock, or into a speckled hen, into a fly, or a maize plant, a rhododendron, a beetle, a mouse or a woman’. Now, apart from the list of organisms he has given, which falls short of a serious classification of the living world, what he is saying here is that if you could look at the chromosomes, you could compute the… you could calculate the organism, but he’s saying something more. He’s saying that you could actually implement the organism because he goes on to add: ‘But the term code script is of course too narrow. The chromosome structures are at the same time instrumental in bringing about the development they foreshadow. They are law code and executive power, or to use another simile they are architect’s plan and builder’s craft in one.’ What he is saying here is that the chromosomes not only contain a description of the future organism but the means to implement that description or program as we might call it. That’s wrong, because they don’t contain the means to implement it, they only contain a description of the means to implement it and that distinction was made absolutely crystal clear by this remarkable paper of von Neumann, in which he develops and in fact provides a proof of what a self-reproducing automaton – or machine – would have to… would have to contain in order to satisfy the requirements of self-reproduction. He develops this concept of course from the earlier concept of Turing, who had developed ideas of automata that operated on tapes and which of course gave a mechanical example of how you might implement some computation. And if you like, this is the beginning of the theory of computation and what von Neumann has to say about this is made very clear in this essay. And he says that you’ve got to have several components, and I’ll just summarise them. He said you first start with an automaton A, ‘which, when furnished this description of any other automaton in terms of the appropriate functions, will construct that entity’. It’s like a Turing machine, you see. So automaton A has to be given a tape and then it’ll make another automaton A, all right. Now what you have, you’ve got to have… add to this automaton B. ‘Automaton B can make a copy of any instruction I that is furnished to it’, so if you give it a program, it’ll copy the program. And then you have… you combine A and B with each other and you give a control mechanism C, that you have to add as well, which does the following. It says, ‘Let A be furnished with the instruction I’. So you give… you give the tape to A, A copies it under the control of C, okay, by every description, so it makes another copy of A, then you… and of course B is part of A. Then you… C then tells B, ‘copy this tape I’, gives B the copying part of it, this copies it and then C will then take the tape I, and put it into the new machine, okay, and turns it loose as he says here, ‘as an independent entity’.
48: Automata akin to Living Cells:
What von Neumann says is that you need several components in order to provide this self-reproducing automaton. One component, which he calls automaton A, will make another automaton A when furnished with a description of itself. Then you need an automaton C… you need an automaton B, which has the property of copying the instruction tape I, and then you need a control mechanism which will actually control the switching. And so this automaton, or machine, can reproduce itself in the following way. The entity A is provided with a copy of… with the tape I, it now makes another A. The control mechanism then takes the tape I and gives it to B and says make a copy of this tape. It makes a copy of the tape and the control mechanism then inserts the new copy of the tape into the new automaton and effects the separation of the two. Now, he shows that the entire complex is clearly self-reproductive, there is no vicious circle and he goes on to say, in a very modest way I think, the following. He says, ‘The description of this automaton has some further attractive sides into which I shall not go at this time into any length. For instance, it is quite clear that the instruction I is roughly effecting the functions of a gene. It is also clear that the copying mechanism B performs the fundamental act of reproduction, the duplication of the genetic material which is also clearly the fundamental operation in the multiplication of living cells. It is also clear to see how arbitrary alterations of the system E, and in particular of the tape I, can exhibit certain traits which appear in connection with mutation, which is lethality as a rule, but with a possibility of continuing reproduction with a modification of traits.’ So, I mean, this is… this we know from later work that these ideas were first put forward by him in the late ’40s. This is the… a published form which I read in early 1952; the book was published a year earlier and so I think that it’s a remarkable fact that he had got it right, but I think that because of the cultural difference – distinction between what most biologists were, what most physicists and mathematicians were – it absolutely had no impact at all.”
References and Also See:
1. Proving Darwin: Making Biology Mathematical, Gregory Chatin (Amazon).
2. Theory of Self-Reproducing Automata, John von Neumann. (PDF).