Posts Tagged ‘Deep Learning’

A semi-useful fact for people working in Machine Learning.

Blurb: Recently, my supervisor was kind enough to institute a small reading group on Deep Neural Networks to help us understand them better. A side product of this would hopefully be that I get to explore this old interest of mine (I have an old interest in Neural Nets and have always wanted to study them in detail, especially work from the mid 90s when the area matured and connections to many others areas such as statistical physics became apparent. But before I got to do that, they seem to be back in vogue! I dub this resurgence as Connectionism 2.0 Although the hipster in me dislikes hype, it is always lends a good excuse to explore an interest). I also attended a Summer School on the topic at UCLA, find slides and talks over here.

Coming back: we have been making our way through this Foundations and Trends volume by Yoshua Bengio (which I quite highly recommend). Bengio spends a lot of time giving intuitions and arguments on why and when Deep Architectures are useful. One particular argument (which is actually quite general and not specific to trees as it might appear) went like this:

Ensembles of trees (like boosted trees, and forests) are more powerful than a single tree. They add a third level to the architecture which allows the model to discriminate among a number of regions exponential in the number of parameters. As illustrated in […], they implicitly form a distributed representation […]

This is followed by an illustration with the following figure:

[Image Source: Learning Deep Architectures for AI]

and accompanying text to the above figure:

Whereas a single decision tree (here just a two-way partition) can discriminate among a number of regions linear in the number of parameters (leaves), an ensemble of trees can discriminate among a number of regions exponential in the number of trees, i.e., exponential in the total number of parameters (at least as long as the number of trees does not exceed the number of inputs, which is not quite the case here). Each distinguishable region is associated with one of the leaves of each tree (here there are three 2-way trees, each defining two regions, for a total of seven regions). This is equivalent to a multi-clustering, here three clusterings each associated with two regions.

Now, the following question was considered: Referring to the text in boldface above, is the number of regions obtained exponential in the general case? It is easy to see that there would be cases where it is not exponential. For example: the number of regions obtained by the three trees would be the same  as those obtained by one tree if all three trees overlap, hence giving no benefit. The above claim refers to a paper (also by Yoshua Bengio) where he constructs examples to show the number of regions could be exponential in some cases.

But suppose for our satisfaction we are interested in actual numbers and the following general question, an answer to which should also answer the question raised above:

What is the maximum possible number of regions that can be obtained in {\bf R}^n by the intersection of m hyperplanes?

Let’s consider some examples in the {\bf R}^2 case.

Where there is one hyperplace i.e. m = 1, then the maximum number of possible regions is obviously two.

[1 Hyperplane: 2 Regions]

Clearly, for two hyerplanes, the maximum number of possible regions is 4.

[2 Hyperplanes: 4 Regions]

In case there are three hyperplanes, the maximum number of regions that might be possible would be 7 as illustrated in the first figure. For m =4 i.e. 4 hyerplanes, this number is 11 as shown below:

[4 Hyperplanes: 11 Regions]

On inspection we can see the number of regions with m hyperplanes in ${\bf R}^2$ is given by:

Number of Regions = #lines + #intersections + 1

Now let’s consider the general case: What is the expression that would give us the maximum number of regions possibles with m hyperplanes in {\bf R}^n? Turns out that there is a definite expression for the same:

Let \mathcal{A} be an arrangement of m hyperplanes in {\bf R}^n, then the maximum number of regions possible is given by

\displaystyle r(\mathcal{A}) = 1 + m + \dbinom{m}{2} \ldots + \dbinom{m}{n}

the number of bounded regions (closed from all sides) is given by:

\displaystyle b(\mathcal{A}) = \dbinom{m - 1}{n}

The above expressions actually derive from a more general expression when \mathcal{A} is specifically a real arrangement.

I am not familiar with some of the techniques that are required to prove this in the general case (\mathcal{A} is a set of affine hyperplanes in a vector space defined over some Field).  The above might be proven by induction. However, it turns out that in the {\bf R}^2 case the result is related to the crossing lemma (see discussion over here), I think it was one of the most favourite results of one of my previous advisors and he often talked about it. This connection makes me want to study the details [1],[2], which I haven’t had the time to look at and I will post the proof for the general version in a future post.



1. An Introduction to Hyperplane Arrangements: Richard P. Stanley (PDF)


(Helpful for the proof)

2. Partially Ordered Sets: William Trotter (Handbook of Combinatorics) (PDF)


Onionesque Reality Home >>


Read Full Post »

Deep Learning reads Wikipedia and discovers the meaning of life – Geoff Hinton.

The above quote is from a very interesting talk by Geoffrey Hinton I had the chance to attend recently.

I have been at a summer school on Deep Neural Nets and Unsupervised Featured Learning at the Institute for Pure and Applied Mathematics at UCLA since July 9 (till July 27). It has been organized by Geoff Hinton, Yoshua Bengio, Stan Osher, Andrew Ng and Yann LeCun.

I have always been a “fan” of Neural Nets and the recent spike in interest in them has made me excited, thus the school happened at just the right time. The objective of the summer school is to give a broad overview of some of the recent work in Deep Learning and Unsupervised Feature Learning with emphasis on optimization, deep architectures and sparse representations. I must add that after getting here and looking at the peer group I would consider myself lucky to have obtained funding for the event!

[Click on the above image to see slides for the talks. Videos will be added at this location after July 27 Videos are now available]

That aside, if you are interested in Deep Learning or Neural Networks in general, the slides for the talks are being uploaded over here (or click on the image above), videos will be added at the same location some time after the summer school ends so you might like to bookmark this link.

The school has been interesting given the wide range of people who are here. The diversity of opinions about Deep Learning itself has given a good perspective on the subject and the issues and strengths of it. There are quite a few here who are somewhat skeptical of deep learning but are curious, while there are some who have been actively working on the same for a while. Also, it has been enlightening to see completely divergent views between some of the speakers on key ideas such as sparsity. For example Geoff Hinton had a completely different view of why sparsity was useful in classification tasks than compared to Stéphane Mallat, who gave a very interesting talk today even joking that “Hinton and Yann LeCun told you why sparsity is useful, I’ll tell you why sparsity is useless. “. See the above link for more details.

Indeed, such opinions do tell you that there is a lot of fecund ground for research in these areas.

I have been compiling a reading list on some of this stuff and will make a blog-post on the same soon.


Onionesque Reality Home >>

Read Full Post »

This is a first for this blog, and hence worth mentioning.

I came across a paper that is to appear in the proceedings of the IEEE Conference on Computer Systems and Applications 2010. Find the paper here.

This paper cites an old post on this blog, one of the first few infact. This is reference number [2] on the paper. It was good to know, and more importantly, a boost to blog to discuss small ideas that are otherwise improper for a formal presentation.


Since it is lame to write just the above lines, I leave you with a couple of talks that I watched over the friday night and I would highly recommend.

There was a talk by Machine Learning pioneer Geoffrey Hinton some years ago at Google Tech Talks that became quite a hit. This talk was titled The Next Generation of Neural Networks that discusses Restricted Boltzmann Machines, and how this generative approach can lead to learning complex and deep dependencies in the data.

There was a follow up talk recently, that I had long bookmarked, but just got around to seeing yesterday. This like the previous is a fantastic talk that has completed my conversion to begin exploring deep learning methods. :)

Here is the talk –

Another great talk that I had been looking at last night was a talk by Prof Yann LeCun

Here is the talk –

This talk is started by the late Sam Roweis. It feels good at one level to see his work preserved on the internet. I have quite enjoyed talks by him at summer schools in the past.


Onionesque Reality Home >>

Read Full Post »