Onionesque Reality

IPAM-UCLA Summer School on Deep Learning and Feature Learning

Posted in Computer Vision, Machine Learning, Mathematics, Signal Processing, Video Lectures, tagged Audio Processing, Computer Vision, Deep Learning, Feature Learning, Machine Learning, Neural Networks, Sparse Representations, Unsupervised Learning on July 19, 2012| 4 Comments »

Deep Learning reads Wikipedia and discovers the meaning of life – Geoff Hinton.

The above quote is from a very interesting talk by Geoffrey Hinton I had the chance to attend recently.

I have been at a summer school on Deep Neural Nets and Unsupervised Featured Learning at the Institute for Pure and Applied Mathematics at UCLA since July 9 (till July 27). It has been organized by Geoff Hinton, Yoshua Bengio, Stan Osher, Andrew Ng and Yann LeCun.

I have always been a “fan” of Neural Nets and the recent spike in interest in them has made me excited, thus the school happened at just the right time. The objective of the summer school is to give a broad overview of some of the recent work in Deep Learning and Unsupervised Feature Learning with emphasis on optimization, deep architectures and sparse representations. I must add that after getting here and looking at the peer group I would consider myself lucky to have obtained funding for the event!

[Click on the above image to see slides for the talks. ~~Videos will be added at this location after July 27~~ Videos are now available]

That aside, if you are interested in Deep Learning or Neural Networks in general, the slides for the talks are being uploaded over here (or click on the image above), videos will be added at the same location some time after the summer school ends so you might like to bookmark this link.

The school has been interesting given the wide range of people who are here. The diversity of opinions about Deep Learning itself has given a good perspective on the subject and the issues and strengths of it. There are quite a few here who are somewhat skeptical of deep learning but are curious, while there are some who have been actively working on the same for a while. Also, it has been enlightening to see completely divergent views between some of the speakers on key ideas such as sparsity. For example Geoff Hinton had a completely different view of why sparsity was useful in classification tasks than compared to Stéphane Mallat, who gave a very interesting talk today even joking that “Hinton and Yann LeCun told you why sparsity is useful, I’ll tell you why sparsity is useless. “. See the above link for more details.

Indeed, such opinions do tell you that there is a lot of fecund ground for research in these areas.

I have been compiling a reading list on some of this stuff and will make a blog-post on the same soon.

________________

Evolvability: Leslie Valiant (Journal of the ACM – PDF)

“Darwinian Evolution is a form of PAC (Machine) Learning”

Posted in Biology, Complexity, Computer Science, Darwinism, Machine Learning, Nature, Video Lectures, tagged Computational Learning Theory, Computer Science, Evolution, Evolvability, Genetics, Leslie Valiant, Machine Learning, PAC Learning, Talks, Turing Award on November 12, 2011| 5 Comments »

Changing or increasing functionality of circuits in biological evolution is a form of computational learning. – Leslie Valiant

The title of this post comes from Prof. Leslie Valiant‘s The ACM Alan M. Turing award lecture titled “The Extent and Limitations of Mechanistic Explanations of Nature”.

Prof. Leslie G. Valiant

Click on the image above to watch the lecture

[Image Source: CACM “Beauty and Elegance”]

Short blurb: Though the lecture came out sometime in June-July 2011, and I have shared it (and a paper that it quotes) on every online social network I have presence on, I have no idea why I never blogged about it.

The fact that I have zero training (and epsilon knowledge of) in biology that has not stopped me from being completely fascinated by the contents of the talk and a few papers that he cites in it. I have tried to see the lecture a few times and have also started to read and understand some of the papers he mentions. Infact, the talk has inspired me enough to know more about PAC Learning than the usual Machine Learning graduate course might cover. Knowing more about it is now my “full time side-project” and it is a very exciting side-project to say the least!

_________________________

Getting back to the title: One of the motivating questions about this work is the following:

It is widely accepted that Darwinian Evolution has been the driving force for the immense complexity observed in life or how life evolved. In this beautiful 10 minute video Carl Sagan sums up the timeline and the progression:

There is however one problem: While evolution is considered the driving force for such complexity, there isn’t a satisfactory explanation of how 13.75 billion years of it could have been enough. Many have often complained that this reduces it to a little more than an intuitive explanation. Can we understand the underlying mechanism of Evolution (that can in turn give reasonable time bounds)? Valiant makes the case that this underlying mechanism is of computational learning.

There have been a number of computational models that have been based on the general intuitive idea of Darwinian Evolution. Some of these include: Genetic Algorithms/Programming etc. However, people like Valiant amongst others find such methods useful in an engineering sense but unsatisfying w.r.t the question.

In the talk Valiant mentions that this question was asked in Darwin’s day as well. To which Darwin proposed a bound of 300 million years for such evolution to occur. This immediately fell into a problem as Lord Kelvin, one of the leading physicists of the time put the figure of the age of Earth to be 24 million years. Now obviously this was a problem as evolution could not have happened for more than 24 million years according to Kelvin’s estimate. The estimate of the age of the Earth is now much higher. ;-)

The question can be rehashed as: How much time is enough? Can biological circuits evolve in sub-exponential time?

For more I would point out to his paper:

Towards the end of the talk he shows a Venn diagram of the type usually seen in complexity theory text books for classes P, NP, BQP etc but with one major difference: These subsets are fact and not unproven:

$Fact: Evolvability \subseteq SQ Learnable \subseteq PAC Learnable$

*SQ or Statistical Query Learning is due to Michael Kearns (1993)

Coda: Valiant claims that the problem of evolution is no more mysterious than the problem of learning. The mechanism that underlies biological evolution is “evolvable target pursuit”, which in turn is the same as “learnable target pursuit”.

_________________________

Conditional Random Fields: A Beginner’s Survey

Posted in Computer Science, Machine Learning, Software, Video Lectures, tagged Conditional Random Fields, Graphical Models, Log Linear Models, Machine Learning, Relational Learning, Sequential Learning, Structured Outputs, Survey, Tutorials, Video Lectures on August 20, 2011| 16 Comments »

One interesting project that I am involved in these days involves certain problems in Intelligent Tutors. It turns out that perhaps one of the best ways to tackle them is by using Conditional Random Fields (CRFs). Many attempts to solving these problems still involve Hidden Markov Models (HMMs). Since I have never really been a Graphical Models guy (though I am always fascinated) so I found the going on studying CRFs quite difficult. Now that the survey is more or less over, here are my suggestions for beginners to go about learning them.

Tutorials and Theory

1. Log-Linear Models and Conditional Random Fields (Tutorial by Charles Elkan)

Log-linear Models and Conditional Random Fields
Charles Elkan

6 videos: Click on Image above to view

Two directions of approaching CRFs are especially useful to get a good perspective on their use. One of these is considering CRFs as an alternate to Hidden Markov Models (HMMs) while another is to think of CRFs building over Logistic Regression.

This tutorial makes an approach from the second direction and is easily one of the most basic around. Most people interested in CRFs would ofcourse be familiar with ideas of maximum likelihood, logistic regression etc. This tutorial does a good job, starting with the absolute basics – talking about logistic regression (for a two class problem) to a more general multi-label machine learning problem with a structured output (outputs having a structure). I tried reading a few tutorials before this one, but found this to be the most comprehensive and the best place to start. It however seems that there is one lecture missing in this series which (going by the notes) covered more training algorithms.

2. Survey Papers on Relational Learning

These are not really tutorials on CRFs, but talk of sequential learning in general. For beginners, these surveys are useful to clarify the range of problems in which CRFs might be useful while also discussing other methods for the same briefly. I would recommend these two tutorials to help put CRFs in perspective in the broader machine learning sub-area of Relational Learning.

— Machine Learning for Sequential Learning: A Survey (Thomas Dietterich)

This is a very broad survey that talks of sequential learning, defines the problem and some of the most used methods.

— An Introduction to Structured Discriminative Learning (R Memisevic)

This tutorial is like the above, however focuses more on comparing CRFs with large margin methods such as SVM. Giving yet another interesting perspective in placing CRFs.

3. Comprehensive CRF Tutorial (Andrew McCallum and Charles Sutton)

This tutorial is the most compendious tutorial available for CRF. While it claims to start from the bare bone basics, I found it hard for a start and took it on third (after the above two). It is potentially the starting and ending point for a more advanced Graphical Models student. It is extensive (90 pages) and gives a feeling of comfort with CRFs when done. It is definitely the best tutorial available though by no means the most easiest point to start if you have never done any sequential learning before.

This might be considered an extension to this tutorial by McCallum et al : CRFs for Relational Learning (PDF)

4. Original CRF Paper (John Lafferty et al.)

Though not necessary to learn CRFs given many better tutorials, this paper is still recommended, being the first on CRFs.

5. Training/Derivations (Rahul Gupta)

This report is good for the various training methods and for one to go through the derivations associated.

6. Applications to Vision (Nowozin/Lampert)

If your primary focus is using structured prediction in Computer Vision/Image Analysis then a good tutorial (with a large section on CRFs) can be found over here:

Structured prediction and learning in Computer Vision (Foundations and Trends Volume).

1. Hidden State Conditional Random Fields (H CRF)

___________________

Extensions to the CRF concept

There are a number of extensions to CRFs. The two that I have found most helpful in my work are (these are easy to follow given the above):

2. Latent Dynamic Conditional Random Fields (LDCRF)

Both of these extensions work to include hidden variables in the CRF framework.

___________________

Software Packages

1. Kevin Murphy’s CRF toolbox (MATLAB)

2. MALLET (I haven’t used MALLET, it is Java based)

3. HCRF – LDCRF Library (MATLAB, C++, Python). As as the name suggests, this package is for HCRF and LDCRF, though can be used as a standalone package for CRF as well.

Strangeness Minus Three (Richard Feynman and Murray Gell-Mann 1964)

Posted in Physics, Video Lectures, tagged BBC, BBC Horizon, Documentaries, Experimental Physics, Murray Gel-Mann, Quantum Physics, Richard Feynman, Theoretical Physics on March 20, 2011| 3 Comments »

I am a big fan and collector of the BBC Horizon documentaries and I was pleasantly surprised to have found an old one (probably from the year Horizon started, though I think this is from 1966) that I didn’t know exist till two weeks ago. It is on the exciting discovery of the $\Omega -$ and features Richard Feynman and Murray Gell-Mann. It, like the old Horizon documentaries is more technical but at the same time more raw and exciting. And is worth watching only for its historical significance and age if nothing else. Definitely a collector’s item!

________________

Strangeness Minus Three (BBC Horizon, 1964/6)

Total Runtime: 41:20

[Part 1 | Part 2 | Part3]

[Alternative Link]

________________

Stanford Deep Learning Lectures (Just About)

Posted in Machine Learning, Video Lectures, tagged Andrew Ng, Lectures, Machine Learning, Stanford University, videos on February 13, 2011| Leave a Comment »

The first part is just to motivate this upcoming Stanford video series.

Deep Learning? Supervised Learning is the process where an entity has to “teach” or “supervise” the learning. The learning algorithm (such as a neural network) is shown some features (which are carefully extracted) and then it is told the correct answer (training). Over time it learns a function that maps features to labels. It thus focuses on finding what would be the class label given a set of features i.e. $P(Y|X)$ where $Y$ is the class and $X$ the features. For example in face recognition, after we have extracted features using a technique such as PCA or ICA, the task is to use these features and label information (person name or ID etc) to learn a function that can make predictions. But we see in everyday life that label information is not as important in learning. Humans do some kind of “clustering” and generative modeling of whatever they see all the time. Given a set of objects we tend to form a generative model of those objects, and then assign labels, labels thus give very little information in actual learning. Another interesting question is how features are learnt in the first place? Is it an unsupervised task? How can a computer learn features in an unsupervised manner?

Unsupervised Feature Learning? Now consider a task where you have to improve accuracy on classifying an image as that of an elephant or a Rhino. But the catch is that you are not given any labeled examples of elephants or Rhinos, not even that, suppose you are not even given unlabeled examples of them. But you are given random images of rivers and mountains and you have to learn a feature representation from these that can help you in your task. This can be done by sparse coding as shown by Raina et al.

______________

Lectures: Recently I came across a series of lectures (which are a work in progress) by Professor Andrew Y. Ng on Unsupervised Feature Learning and Deep Learning. This course will help present some issues such as the above to a wider audience. Though still not yet uploaded, I am really excited about these as I had really enjoyed his CS 229 lectures a long time ago. This course needs some basic knowledge of Machine Learning, but does brush up some basics.

I have been working on Meta-Learning for a while, but have been getting more interested in Deep Learning Methods recently and hence am looking forward for these lectures to come online.

I wrote to Professor Ng about them and in his opinion it would take a few months before they can be put up. I think that works fine as I plan to work on Deep Learning in the summers and that these would really help. Even now expertise in Deep Learning Methods is restricted to only a few places and thus such lectures would be a great advantage.

Here is a description to the Unsupervised Feature Learning and Deep Learning course:

Machine learning has seen numerous successes, but applying learning algorithms today often means spending a long time hand-engineering the input feature representation. This is true for many problems in vision, audio, NLP, robotics, and other areas. In this course, you’ll learn about methods for unsupervised feature learning and deep learning, which automatically learn a good representation of the input from unlabeled data. You’ll also pick up the “hands-on,” practical skills and tricks-of-the-trade needed to get these algorithms to work well.

Basic knowledge of machine learning (supervised learning) is assumed, though we’ll quickly review logistic regression and gradient descent.

I hope this would be as widely viewed as the CS 229 lectures. I say that as I know these would be fantastic.

______________

First Blog Post Citation

Posted in Machine Learning, Random, Video Lectures, tagged Autoencoders, Deep Learning, Geoffrey Hinton, Machine Learning, Random, Restricted Boltzmann Machine on September 26, 2010| 4 Comments »

This is a first for this blog, and hence worth mentioning.

I came across a paper that is to appear in the proceedings of the IEEE Conference on Computer Systems and Applications 2010. Find the paper here.

This paper cites an old post on this blog, one of the first few infact. This is reference number [2] on the paper. It was good to know, and more importantly, a boost to blog to discuss small ideas that are otherwise improper for a formal presentation.

___________

Since it is lame to write just the above lines, I leave you with a couple of talks that I watched over the friday night and I would highly recommend.

There was a talk by Machine Learning pioneer Geoffrey Hinton some years ago at Google Tech Talks that became quite a hit. This talk was titled The Next Generation of Neural Networks that discusses Restricted Boltzmann Machines, and how this generative approach can lead to learning complex and deep dependencies in the data.

There was a follow up talk recently, that I had long bookmarked, but just got around to seeing yesterday. This like the previous is a fantastic talk that has completed my conversion to begin exploring deep learning methods. :)

Here is the talk –

Another great talk that I had been looking at last night was a talk by Prof Yann LeCun

Here is the talk –

This talk is started by the late Sam Roweis. It feels good at one level to see his work preserved on the internet. I have quite enjoyed talks by him at summer schools in the past.

___________