Feeds:
Posts

## Transitions in Evolution and Information Processing: Lecture by John Maynard Smith

A second post in a series of posts about Information Theory/Learning based perspectives in Evolution, that started off from the last post.

Although the last post was mostly about a historical perspective, it had a section where the main motivation for some work in metabiology due to Chaitin (now published as a book) was reviewed. The starting point about that work was to view evolution solely through an information processing lens (and hence the use of Algorithmic Information Theory). Ofcourse this lens by itself is not a recent acquisition and goes back a few decades (although in hindsight the fact that it goes back just a few decades is very surprising to me at least). To illustrate this I wanted to share some analogies by John Maynard Smith (perhaps one of my favourite scientists), which I had found to be particularly incisive and clear. To avoid clutter, they are shared here instead (note that most of the stuff he talks about is something we study in high school, however the talk is quite good, especially because it tends to emphasize on the centrality of information throughout). I also want this post to act as a reference for some upcoming posts.

Coda:

Molecular Biology is all about Information. I want to be a little more general than that; the last century, the 19th century was a century in which Science discovered how energy could be transformed from one form to another […] This century will be seen […] where it became clear that information could be translated from one from to another.

[Other parts: Part 2, Part 3, Part 4, Part 5, Part 6]

Throughout this talk he gives wonderful analogies on how information translation underlies the so called Central Dogma of Molecular Biology, and how if the translation was one-way in some stages it could have implications (i.e how August Weismann noted that acquired characters are not inherited by giving a “Chinese telegram translation analogy”; since there was no mechanism to translate acquired traits (acquired information) into the organism so that it could be propagated).

However, the most important point from the talk: One could see evolution as being punctuated by about 6 or so major changes or shifts. Each of these events was marked by the way information was stored and processed in a different way. Some that he talks about are:

1. The origin of replicating molecules.

2. The Evolution of chromosomes: Chromosomes are just strings of the above replicating molecules. The property that they have is that when one of these molecules is replicated, the others have to be as well. The utility of this is the following: Since they are all separate genes, they might have different rates of replication and the gene that replicates fastest will soon outnumber all the others and all the information would be lost. Thus this transition underlies a kind of evolution of cooperation between replicating molecules or in other other words chromosomes are a way for forced cooperation between genes.

3. The Evolution of the Code: That information in the nucleic could be translated to sequences of amino acids i.e. proteins.

4. The Origin of Sex: The evolution of sex is considered an open question. However one argument goes that (details in next or next to next post) the fact that sexual reproduction hastens the acquisition from the environment (as compared to asexual reproduction) explains why it should evolve.

5. The Evolution of multicellular organisms: A large, complex signalling system had to evolve for these different kind of cells to function in an organism properly (like muscle cells or neurons to name some in Humans).

6. Transition from solitary individuals to societies: What made these societies of individuals (ants, humans) possible at all? Say if we stick to humans, this could have only happened only if there was a new way to transmit information from generation to generation – one such possible information transducing machine could be language! Thus giving an additional mechanism to transmit information from one generation to another other than the genetic mechanisms (he compares the genetic code and replication of nucleic acids and the passage of information by language). This momentous event (evolution of language ) itself dependent on genetics. With the evolution of language, other things came by:  Writing, Memes etc. Which might reproduce and self-replicate, mutate and pass on and accelerate the process of evolution. He ends by saying this stage of evolution could perhaps be as profound as the evolution of language itself.

________________

As a side comment: I highly recommend the following interview of John Maynard Smith as well. I rate it higher than the above lecture, although it is sort of unrelated to the topic.

________________

Interesting books to perhaps explore:

2. The Evolution of Sex: John Maynard Smith (more on this theme in later blog posts, mostly related to learning and information theory).

________________

## IPAM-UCLA Summer School on Deep Learning and Feature Learning

Deep Learning reads Wikipedia and discovers the meaning of life – Geoff Hinton.

The above quote is from a very interesting talk by Geoffrey Hinton I had the chance to attend recently.

I have been at a summer school on Deep Neural Nets and Unsupervised Featured Learning at the Institute for Pure and Applied Mathematics at UCLA since July 9 (till July 27). It has been organized by Geoff Hinton, Yoshua Bengio, Stan Osher, Andrew Ng and Yann LeCun.

I have always been a “fan” of Neural Nets and the recent spike in interest in them has made me excited, thus the school happened at just the right time. The objective of the summer school is to give a broad overview of some of the recent work in Deep Learning and Unsupervised Feature Learning with emphasis on optimization, deep architectures and sparse representations. I must add that after getting here and looking at the peer group I would consider myself lucky to have obtained funding for the event!

[Click on the above image to see slides for the talks. Videos will be added at this location after July 27 Videos are now available]

That aside, if you are interested in Deep Learning or Neural Networks in general, the slides for the talks are being uploaded over here (or click on the image above), videos will be added at the same location some time after the summer school ends so you might like to bookmark this link.

The school has been interesting given the wide range of people who are here. The diversity of opinions about Deep Learning itself has given a good perspective on the subject and the issues and strengths of it. There are quite a few here who are somewhat skeptical of deep learning but are curious, while there are some who have been actively working on the same for a while. Also, it has been enlightening to see completely divergent views between some of the speakers on key ideas such as sparsity. For example Geoff Hinton had a completely different view of why sparsity was useful in classification tasks than compared to Stéphane Mallat, who gave a very interesting talk today even joking that “Hinton and Yann LeCun told you why sparsity is useful, I’ll tell you why sparsity is useless. “. See the above link for more details.

Indeed, such opinions do tell you that there is a lot of fecund ground for research in these areas.

I have been compiling a reading list on some of this stuff and will make a blog-post on the same soon.

________________

Onionesque Reality Home >>

## “Darwinian Evolution is a form of PAC (Machine) Learning”

Changing or increasing functionality of circuits in biological evolution is a form of computational learning. – Leslie Valiant

The title of this post comes from Prof. Leslie Valiant‘s The ACM Alan M. Turing award lecture titled “The Extent and Limitations of Mechanistic Explanations of Nature”.

Prof. Leslie G. Valiant

Click on the image above to watch the lecture

[Image Source: CACM “Beauty and Elegance”]

Short blurb: Though the lecture came out sometime in June-July 2011, and I have shared it (and a paper that it quotes) on every online social network I have presence on, I have no idea why I never blogged about it.

The fact that I have zero training (and epsilon knowledge of) in biology that has not stopped me from being completely fascinated by the contents of the talk and a few papers that he cites in it. I have tried to see the lecture a few times and have also started to read and understand some of the papers he mentions. Infact, the talk has inspired me enough to know more about PAC Learning than the usual Machine Learning graduate course might cover. Knowing more about it is now my “full time side-project” and it is a very exciting side-project to say the least!

_________________________

It is widely accepted that Darwinian Evolution has been the driving force for the immense complexity observed in life or how life evolved. In this beautiful 10 minute video Carl Sagan sums up the timeline and the progression:

There is however one problem: While evolution is considered the driving force for such complexity, there isn’t a satisfactory explanation of how 13.75 billion years of it could have been enough. Many have often complained that this reduces it to a little more than an intuitive explanation. Can we understand the underlying mechanism of Evolution (that can in turn give reasonable time bounds)? Valiant makes the case that this underlying mechanism is of computational learning.

There have been a number of computational models that have been based on the general intuitive idea of Darwinian Evolution. Some of these include: Genetic Algorithms/Programming etc. However, people like Valiant amongst others find such methods useful in an engineering sense but unsatisfying w.r.t the question.

In the talk Valiant mentions that this question was asked in Darwin’s day as well. To which Darwin proposed a bound of 300 million years for such evolution to occur. This immediately fell into a problem as Lord Kelvin, one of the leading physicists of the time put the figure of the age of Earth to be 24 million years. Now obviously this was a problem as evolution could not have happened for more than 24 million years according to Kelvin’s estimate. The estimate of the age of the Earth is now much higher. ;-)

The question can be rehashed as: How much time is enough? Can biological circuits evolve in sub-exponential time?

For more I would point out to his paper:

Evolvability: Leslie Valiant (Journal of the ACM – PDF)

Towards the end of the talk he shows a Venn diagram of the type usually seen in complexity theory text books for classes P, NP, BQP etc but with one major difference: These subsets are fact and not unproven:

$Fact: Evolvability \subseteq SQ Learnable \subseteq PAC Learnable$

*SQ or Statistical Query Learning is due to Michael Kearns (1993)

Coda: Valiant claims that the problem of evolution is no more mysterious than the problem of learning. The mechanism that underlies biological evolution is “evolvable target pursuit”, which in turn is the same as “learnable target pursuit”.

_________________________

Onionesque Reality Home >>

## Conditional Random Fields: A Beginner’s Survey

One interesting project that I am involved in these days involves certain problems in Intelligent Tutors. It turns out that perhaps one of the best ways to tackle them is by using Conditional Random Fields (CRFs). Many attempts to solving these problems still involve Hidden Markov Models (HMMs). Since I have never really been a Graphical Models guy (though I am always fascinated) so I found the going on studying CRFs quite difficult. Now that the survey is more or less over, here are my suggestions for beginners to go about learning them.

Tutorials and Theory

1. Log-Linear Models and Conditional Random Fields (Tutorial by Charles Elkan)

Log-linear Models and Conditional Random Fields
Charles Elkan

6 videos: Click on Image above to view

Two directions of approaching CRFs are especially useful to get a good perspective on their use. One of these is considering CRFs as an alternate to Hidden Markov Models (HMMs) while another is to think of CRFs building over Logistic Regression.

This tutorial makes an approach from the second direction and is easily one of the most basic around. Most people interested in CRFs would ofcourse be familiar with ideas of maximum likelihood, logistic regression etc. This tutorial does a good job, starting with the absolute basics – talking about logistic regression (for a two class problem) to a more general multi-label machine learning problem with a structured output (outputs having a structure). I tried reading a few tutorials before this one, but found this to be the most comprehensive and the best place to start. It however seems that there is one lecture missing in this series which (going by the notes) covered more training algorithms.

2. Survey Papers on Relational Learning

These are not really tutorials on CRFs, but talk of sequential learning in general. For beginners, these surveys are useful to clarify the range of problems in which CRFs might be useful while also discussing other methods for the same briefly. I would recommend these two tutorials to help put CRFs in perspective in the broader machine learning sub-area of Relational Learning.

— Machine Learning for Sequential Learning: A Survey (Thomas Dietterich)

PDF

This is a very broad survey that talks of sequential learning, defines the problem and some of the most used methods.

— An Introduction to Structured Discriminative Learning (R Memisevic)

PS

This tutorial is like the above, however focuses more on comparing CRFs with large margin methods such as SVM. Giving yet another interesting perspective in placing CRFs.

3. Comprehensive CRF Tutorial (Andrew McCallum and Charles Sutton)

PDF

This tutorial is the most compendious tutorial available for CRF. While it claims to start from the bare bone basics, I found it hard for a start and took it on third (after the above two). It is potentially the starting and ending point for a more advanced Graphical Models student. It is extensive (90 pages) and gives a feeling of comfort with CRFs when done. It is definitely the best tutorial available though by no means the most easiest point to start if you have never done any sequential learning before.

This might be considered an extension to this tutorial by McCallum et al : CRFs for Relational Learning (PDF)

4. Original CRF Paper (John Lafferty et al.)

PDF

Though not necessary to learn CRFs given many better tutorials, this paper is still recommended, being the first on CRFs.

5. Training/Derivations (Rahul Gupta)

PDF

This report is good for the various training methods and for one to go through the derivations associated.

6. Applications to Vision (Nowozin/Lampert)

If your primary focus is using structured prediction in Computer Vision/Image Analysis then a good tutorial (with a large section on CRFs) can be found over here:

Structured prediction and learning in Computer Vision (Foundations and Trends Volume).

PDF

___________________

Extensions to the CRF concept

There are a number of extensions to CRFs. The two that I have found most helpful in my work are (these are easy to follow given the above):

Both of these extensions work to include hidden variables in the CRF framework.

___________________

Software Packages

1. Kevin Murphy’s CRF toolbox (MATLAB)

2. MALLET (I haven’t used MALLET, it is Java based)

3. HCRF – LDCRF Library (MATLAB, C++, Python). As as the name suggests, this package is for HCRF and LDCRF, though can be used as a standalone package for CRF as well.

## Strangeness Minus Three (Richard Feynman and Murray Gell-Mann 1964)

I am a big fan and collector of the BBC Horizon documentaries and I was pleasantly surprised to have found an old one (probably from the year Horizon started, though I think this is from 1966) that I didn’t know exist till two weeks ago. It is on the exciting discovery of the $\Omega -$ and features Richard Feynman and Murray Gell-Mann. It, like the old Horizon documentaries is more technical but at the same time more raw and exciting. And is worth watching only for its historical significance and age if nothing else. Definitely a collector’s item!

________________

Strangeness Minus Three (BBC Horizon, 1964/6)

Total Runtime: 41:20

[Part 1 | Part 2 | Part3]

________________

## Stanford Deep Learning Lectures (Just About)

The first part is just to motivate this upcoming Stanford video series.

Deep Learning? Supervised Learning is the process where an entity has to “teach” or “supervise” the learning. The learning algorithm (such as a neural network) is shown some features (which are carefully extracted) and then it is told the correct answer (training). Over time it learns a function that maps features to labels. It thus  focuses on finding what would be the class label given a set of features i.e. $P(Y|X)$ where $Y$ is the class and $X$ the features. For example in face recognition, after we have extracted features using a technique such as PCA or ICA, the task is to use these features and label information (person name or ID etc) to learn a function that can make predictions. But we see in everyday life that label information is not as important in learning. Humans do some kind of “clustering” and generative modeling of whatever they see all the time. Given a set of objects we tend to form a generative model of those objects, and then assign labels, labels thus give very little information in actual learning. Another interesting question is how features are learnt in the first place? Is it an unsupervised task? How can a computer learn features in an unsupervised manner?

Unsupervised Feature Learning? Now consider a task where you have to improve accuracy on classifying an image as that of an elephant or a Rhino. But the catch is that you are not given any labeled examples of elephants or Rhinos, not even that, suppose you are not even given unlabeled examples of them. But you are given random images of rivers and mountains and you have to learn a feature representation from these that can help you in your task. This can be done by sparse coding as shown by Raina et al.

______________

Lectures: Recently I came across a series of lectures (which are a work in progress) by Professor Andrew Y. Ng on Unsupervised Feature Learning and Deep Learning. This course will help present some issues such as the above to a wider audience. Though still not yet uploaded, I am really excited about these as I had really enjoyed his CS 229 lectures a long time ago. This course needs some basic knowledge of Machine Learning, but does brush up some basics.

I have been working on Meta-Learning for a while, but have been getting more interested in Deep Learning Methods recently and hence am looking forward for these lectures to come online.

I wrote to Professor Ng about them and in his opinion it would take a few months before they can be put up. I think that works fine as I plan to work on Deep Learning in the summers and that these would really help. Even now expertise in Deep Learning Methods is restricted to only a few places and thus such lectures would be a great advantage.

Here is a description to the Unsupervised Feature Learning and Deep Learning course:

Machine learning has seen numerous successes, but applying learning algorithms today often means spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. In this course, you’ll learn about methods for unsupervised feature learning and deep learning, which automatically learn a good representation of the input from unlabeled data. You’ll also pick up the “hands-on,” practical skills and tricks-of-the-trade needed to get these algorithms to work well.

Basic knowledge of machine learning (supervised learning) is assumed, though we’ll quickly review logistic regression and gradient descent.

I hope this would be as widely viewed as the CS 229 lectures. I say that as I know these would be fantastic.

______________

Onionesque Reality Home >>

## First Blog Post Citation

This is a first for this blog, and hence worth mentioning.

I came across a paper that is to appear in the proceedings of the IEEE Conference on Computer Systems and Applications 2010. Find the paper here.

This paper cites an old post on this blog, one of the first few infact. This is reference number [2] on the paper. It was good to know, and more importantly, a boost to blog to discuss small ideas that are otherwise improper for a formal presentation.

___________

Since it is lame to write just the above lines, I leave you with a couple of talks that I watched over the friday night and I would highly recommend.

There was a talk by Machine Learning pioneer Geoffrey Hinton some years ago at Google Tech Talks that became quite a hit. This talk was titled The Next Generation of Neural Networks that discusses Restricted Boltzmann Machines, and how this generative approach can lead to learning complex and deep dependencies in the data.

There was a follow up talk recently, that I had long bookmarked, but just got around to seeing yesterday. This like the previous is a fantastic talk that has completed my conversion to begin exploring deep learning methods. :)

Here is the talk –

Another great talk that I had been looking at last night was a talk by Prof Yann LeCun

Here is the talk –

This talk is started by the late Sam Roweis. It feels good at one level to see his work preserved on the internet. I have quite enjoyed talks by him at summer schools in the past.

___________

Onionesque Reality Home >>