Feeds:
Posts

## “Darwinian Evolution is a form of PAC (Machine) Learning”

Changing or increasing functionality of circuits in biological evolution is a form of computational learning. – Leslie Valiant

The title of this post comes from Prof. Leslie Valiant‘s The ACM Alan M. Turing award lecture titled “The Extent and Limitations of Mechanistic Explanations of Nature”.

Prof. Leslie G. Valiant

Click on the image above to watch the lecture

[Image Source: CACM “Beauty and Elegance”]

Short blurb: Though the lecture came out sometime in June-July 2011, and I have shared it (and a paper that it quotes) on every online social network I have presence on, I have no idea why I never blogged about it.

The fact that I have zero training (and epsilon knowledge of) in biology that has not stopped me from being completely fascinated by the contents of the talk and a few papers that he cites in it. I have tried to see the lecture a few times and have also started to read and understand some of the papers he mentions. Infact, the talk has inspired me enough to know more about PAC Learning than the usual Machine Learning graduate course might cover. Knowing more about it is now my “full time side-project” and it is a very exciting side-project to say the least!

_________________________

It is widely accepted that Darwinian Evolution has been the driving force for the immense complexity observed in life or how life evolved. In this beautiful 10 minute video Carl Sagan sums up the timeline and the progression:

There is however one problem: While evolution is considered the driving force for such complexity, there isn’t a satisfactory explanation of how 13.75 billion years of it could have been enough. Many have often complained that this reduces it to a little more than an intuitive explanation. Can we understand the underlying mechanism of Evolution (that can in turn give reasonable time bounds)? Valiant makes the case that this underlying mechanism is of computational learning.

There have been a number of computational models that have been based on the general intuitive idea of Darwinian Evolution. Some of these include: Genetic Algorithms/Programming etc. However, people like Valiant amongst others find such methods useful in an engineering sense but unsatisfying w.r.t the question.

In the talk Valiant mentions that this question was asked in Darwin’s day as well. To which Darwin proposed a bound of 300 million years for such evolution to occur. This immediately fell into a problem as Lord Kelvin, one of the leading physicists of the time put the figure of the age of Earth to be 24 million years. Now obviously this was a problem as evolution could not have happened for more than 24 million years according to Kelvin’s estimate. The estimate of the age of the Earth is now much higher. ;-)

The question can be rehashed as: How much time is enough? Can biological circuits evolve in sub-exponential time?

For more I would point out to his paper:

Evolvability: Leslie Valiant (Journal of the ACM – PDF)

Towards the end of the talk he shows a Venn diagram of the type usually seen in complexity theory text books for classes P, NP, BQP etc but with one major difference: These subsets are fact and not unproven:

$Fact: Evolvability \subseteq SQ Learnable \subseteq PAC Learnable$

*SQ or Statistical Query Learning is due to Michael Kearns (1993)

Coda: Valiant claims that the problem of evolution is no more mysterious than the problem of learning. The mechanism that underlies biological evolution is “evolvable target pursuit”, which in turn is the same as “learnable target pursuit”.

_________________________

Onionesque Reality Home >>

## Demystifying Support Vector Machines for Beginners: A Reading List

Though I have worked on pattern recognition in the past I have always wanted to work with Neural Networks for the same. However for some reason or the other I could never do so, I could not even take it as an elective subject due to some constraints. Over the last two years or so I have been promising myself and ordering myself to stick to a schedule and study ANNs properly, however due to a combination of procrastination, over-work and bad planning I have never been able to do anything with them.

However I have now got the opportunity to work with Support Vector Machines and over the past some time I have been reading extensively on the same and have been trying to get playing with them. Now that the actual implementation and work is set to start I am pretty excited to work with them. It is nice that I get to work with SVMs though I could not with ANNs.

Support Vector Machine is a classifier derived from statistical learning theory by Vladimir Vapnik and his co-workers. The foundations for the same were laid by him as late as the 1970s SVM shot to prominence when using pixel maps as input it gave an accuracy comparable with sophisticated Neural Networks with elaborate features in a handwriting recognition task.

Traditionally Neural Networks based approaches have suffered some serious drawbacks, especially with generalization, producing models that can overfit the data. SVMs embodies the structural risk minimization principle that is shown superior to the empirical risk minimization that neural networks use. This difference gives SVMs the greater ability to generalize.

However learning how to work with SVMs can be challenging and somewhat intimidating at first. When i started reading on the topic I took the books by Vapnik on the subject but could not make much head or tail. I could only attain a certain degree of understanding, nothing more. To specialize in something I do well when I start off as a generalist, having a good and quite correct idea of what is exactly going on. Knowing in general what is to be done and what is what, after this initial know-how makes me comfortable I reach the stage of starting with the mathematics which gives profound understanding as anything without mathematics is meaningless. However most books that I came across missed the first point for me, and it was very difficult to make a headstart. There was a book which I could read in two days that helped me get that general picture quite well. I would highly recommend it for most who are in the process of starting with SVMs.The book is titled Support Vector Machines and other Kernel Based Learning methods and is authored by Nello Cristianini and John-Shawe Taylor.

I would highly recommend people who are starting with Support Vector Machines to buy this book. It can  be obtained easily over Amazon.

This book has very less of a Mathematical treatment but it makes clear the ideas involved and this introduces a person studying from it to think more clearly before he/she can refine his/her understanding by reading something heavier mathematically. Another that I would highly recommend is the book Support Vector Machines for Pattern Classification by Shigeo Abe.

Another book that I highly recommend is Learning with Kernels by Bernhard Scholkopf and Alexander Smola. Perfect book for beginners.

Only after one has covered the required stuff from here that I would suggest Vapnik’s books which then would work wonderfully well.

Other than the books there are a number of Video Lectures and tutorials on the Internet that can work as well!

Below is a listing of a large number of good tutorials on the topic. I don’t intend to flood a person interested in starting with too much information, where ever possible i have described what the document carries so that one could decide what should suffice for him/her on the basis of need. Also I have star-marked some of the posts. This marks the ones that i have seen and studied from personally and found them most helpful and i am sure they would work the same way with both beginners and people with reasonable experience alike.

Webcasts/ Video Lectures on Learning Theory, Support Vector Machines and related ideas:

EDIT: For those interested. I had posted about a course on Machine Learning that has been provided by Stanford university. It too is suited for an introduction to Support Vector Machines. Please find the post here. Also this comment might be helpful, suggestions to it according to your learning journey are also welcome.

1. *Machine Learning Workshop, University of California at Berkeley. This series covers most of the basics required. Beginners can skip the sessions on Bayesian models and Manifold Learning.

Workshop Outline:

Session 1: Classification.

Session 2: Regression.

Session 3: Feature Selection

Session 4: Diagnostics

Session 5: Clustering

Session 6: Graphical Models

Session 7: Linear Dimensionality Reduction

Session 8: Manifold Learning and Visualization

Session 9: Structured Classification

Session 10: Reinforcement Learning

Session 11: Non-Parametric Bayesian Models

2. Washington University. Beginners might be interested on the sole talk on the topic of Supervised Learning for Computer Vision Applications or maybe in the talk on Dimensionality Reduction.

3. Reinforcement Learning, Universitat Freiburg.

4. Deep Learning Workshop. Good talks, But I’d say these are meant for only the highly interested.

5. *Introduction to Learning Theory, Olivier Bousquet.

This tutorial focuses on the “larger picture” than on mathematical proofs, it is not restricted to statistical learning theory however. The course comprises of five lectures and is quite good to watch. The Frenchman is both smart and fun!

6. *Statistical Learning Theory, Olivier Bousquet. This course gives a detailed introduction to Learning Theory with a focus on the Classification problem.

Course Outline:

Probabilistic and Concentration inequalities, Union Bounds, Chaining, Measuring the size of a function class, Vapnik Chervonenkis Dimension, Shattering Dimensions and Rademacher averages, Classification with real valued functions.

7. *Statistical Learning Theory, Olivier Bousquet. This is not the repeat of the above course. This one is a more recent lecture series than the above actually. This course has six lectures. Another excellent set.

Course Outline:

Learning Theory: Foundations and Goals

Learning Bounds: Ingredients and Results

Implications: What to conclude from bounds

7. Advanced Statistical Learning Theory, Olivier Bousquet. This set of lectures compliment the above courses on statistical learning theory and give a more detailed exposition of the current advancements in the same.This course has three lectures.

Course Outline:

PAC Bayesian bounds: a simple derivation, comparison with Rademacher averages, Local Rademacher complexity with classification loss, Talagrand’s inequality. Tsybakov noise conditions, Properties of loss functions for classification (influence on approximation and estimation, relationship with noise conditions), Applications to SVM – Estimation and approximation properties, role of eigenvalues of the Gram matrix.

8. *Statistical Learning Theory, John-Shawe Taylor, University of London. One plus point of this course is that is has some good English. Don’t miss this lecture as it has been given by the same professor whose book we just discussed.

9. *Learning with Kernels, Bernhard Scholkopf.

This course covers the basics for Support Vector Machines and related Kernel methods. This course has six lectures.

Course Outline:

Kernel and Feature Spaces, Large Margin Classification, Basic Ideas of Learning Theory, Support Vector Machines, Other Kernel Algorithms.

10. Kernel Methods, Alexander Smola, Australian National University.  This is an advanced course as compared to the above and covers exponential families, density estimation, and conditional estimators such as Gaussian Process classification, regression, and conditional random fields, Moment matching techniques in Hilbert space that can be used to design two-sample tests and independence tests in statistics.

11. *Introduction to Kernel Methods, Bernhard Scholkopf, There are four parts to this course.

Course Outline:

Kernels and Feature Space, Large Margin Classification, Basic Ideas of Learning Theory, Support Vector Machines, Examples of Other Kernel Algorithms.

12. Introduction to Kernel Methods, Partha Niyogi.

13. Introduction to Kernel Methods, Mikhail Belkin, Ohio State University.This lecture is second in part to the above.

14. *Kernel Methods in Statistical Learning, John-Shawe Taylor.

15. *Support Vector Machines, Chih-Jen Lin, National Taiwan University. Easily one of the best talks on SVM. Almost like a run-down tutorial.

Course Outline:

Basic concepts for Support Vector Machines, training and optimization procedures of SVM, Classification and SVM regression.

16. *Kernel Methods and Support Vector Machines, Alexander Smola. A comprehensive six lecture course.

Course Outline:

Introduction of the main ideas of statistical learning theory, Support Vector Machines, Kernel Feature Spaces, An overview of the applications of Kernel Methods.

1. Basics of Probability and Statistics for Machine Learning, Mikaela Keller.

This course covers most of the basics that would be required for the above courses. However sometimes the shooting quality is a little shady. This talk seems to be the most popular on the video lectures site, one major reason in my opinion is that the lady delivering the lecture is quite pretty!

2. Some Mathematical Tools for Machine Learning, Chris Burges.

3. Machine Learning Laboratory, S.V.N Vishwanathan.

4. Machine Learning Laboratory, Chrisfried Webers.

Introductory Tutorials (PDF/PS):

3. *Support Vector Machines- Hype or Hallelujah (K. P. Bennett, RPI). Click Here >>

4. Support Vector Machines and Pattern Recognition (Georgia Tech). Click Here >>

5. An Introduction to Support Vector Machines in Data Mining (Georgia Tech). Click Here >>

8. *A Practical Guide to Support Vector Classification (Hsu, Chang, Lin, Via U-Michigan Ann Arbor). Click Here >>

9. *A Tutorial on Support Vector Machines for Pattern Recognition (Christopher J.C Burges, Bell Labs Lucent Technologies, Data mining and knowledge Discovery). Click Here >>

10. Support Vector Clustering (Hur, Horn, Siegelmann, Journal of Machine Learning Research. Via MIT). Click Here >>

11. *What is a Support Vector Machine (Noble, MIT). Click Here >>

12. Notes on PCA, Regularization, Sparisty and Support Vector Machines (Poggio, Girosi, MIT Dept of Brain and Cognitive Sciences). Click Here >>

13. *CS 229 Lecture Notes on Support Vector Machines (Andrew Ng, Stanford University). Click Here >>

Introductory Slides (mostly lecture slides):

1. Support Vector Machines in Machine Learning (Arizona State University). Click here >>

Lecture Outline:

What is Machine Learning, Solving the Quadratic Programs, Three very different approaches, Comparison on medium and large sets.

Lecture Outline:

The Learning Problem, What do we know about test data, The capacity of a classifier, Shattering, The Hyperplane Classifier, The Kernel Trick, Quadratic Programming.

3. Support Vector Machines, Linear Case (Jieping Ye, Arizona State University). Click Here >>

Lecture Outline:

Linear Classifiers, Maximum Margin Classifier, SVM for Separable data, SVM for non-separable data.

4. Support Vector Machines, Non Linear Case (Jieping Ye, Arizona State University). Click Here >>

Lecture Outline:

Non Linear SVM using basis functions, Non-Linear SVMs using Kernels, SVMs for Multi-class Classification, SVM path, SVM for unbalanced data.

5. Support Vector Machines (Sue Ann Hong, Carnegie Mellon). Click Here >>

6. Support Vector Machines (Carnegie Mellon University Machine Learning 10701/15781). Click Here >>

9. Support Vector Machines (Via U-Maryland at College Park). Click Here >>

10. Support Vector Machines: Algorithms and Applications (MIT OCW). Click Here >>

Papers/Notes on some basic related ideas (No estoric research papers here):

1. Robust Feature Induction for Support Vector Machines (Arizona State University). Click Here >>

3. *Training Data Set for Support Vector Machines (Brown University). Click Here >>

4. Support Vector Machines are Universally Consistent (Journal Of Complexity). Click Here >>

5. Feature Selection for Classification of Variable Length Multi-Attribute Motions (Li, Khan, Prabhakaran). Click Here >>

6. Selecting Data for Fast Support Vector Machine Training (Wang, Neskovic, Cooper). Click Here >>

8. The Support Vector Decomposition Machine (Periera, Gordon, Carnegie Mellon). Click Here >>

10. Supervised Clustering with Support Vector Machines (Finley, Joachims, Cornell University). Click Here >>

11. Metric Learning: A Support Vector Approach (Cornell University). Click Here >>

12. Training Linear SVMs in Linear Time (Joachims, Cornell Unversity). Click Here >>

13. *Rule Extraction from Linear Support Vector Machines (Fung, Sandilya, Rao, Siemens Medical Solutions). Click Here >>

14. Support Vector Machines, Reproducing Kernel Hilbert Spaces and Randomizeed GACV (Wahba, University of Wisconsian at Madison). Click Here >>

15. The Mathematics of Learning: Dealing with Data (Poggio, Girosi, AI Lab, MIT). Click Here >>

16. Training Invariant Support Vector Machines (Decoste, Scholkopf, Machine Learning). Click Here >>

*As I have already mentioned above, the star marked courses/lectures/tutorials/papers are the ones that I have seen and studied from personally (and hence can vouch for) and these in my opinion should work best for beginners.

Onionesque Reality Home >>

## Video Lectures

Via DataWrangling, Here is one of my best finds since i took to blogging.

While browsing i came across this post that makes up a comprehensive list of publicly available video lectures on various topics on Physics, Mathematics, Computer Science, Neuro-Science etc.

Peter Skomoroch almost writes my story at the start of the post i made a reference to above. There is just too much to do these days, but i like it.

His blog is also highly recommended. It is one of the best i have come across. Though he writes at a lesser frequency, his posts are very high quality.

Pay a visit here, to find updated links for complete courses.

Here is a complete list of all the videos Peter has compiled:

### Open Courseware Directories and Other Video Lecture Roundup Posts

The full post can be viewed here>>

Please note that i have not opened each and every link from the above. If there is any lecture that is not in the public domain, then please notify me. I will remove it (them) with immediate effect. I do not intend to post / propagate stuff which is NOT in the public domain at all.

Onionesque Reality Home >>

## A Fractal with Zero Area and an Infinite Perimeter?

Following a discussion on Reasonable Deviations. I was prompted to write on this.

The above is an illustration on the generation of the Sierpinski Gasket. For simplicity we assume that the area of the initial triangle is 1. We split this triangle into four triangles by joining the mid-points of the sides of the triangle. These smaller triangles as shown in figure 2 have equal areas. We then remove the middle triangle. We adopt the convention that we will only remove the middle triangle and not a triangle at the edge.

In each of the three remaining triangles we repeat the process and remove the middle triangles. This is where self-symmetry comes in. If we pretend to see only one of the three small triangles then we are actually doing the same thing as we did to the original triangle. Albeit on a smaller scale. The above figure shows the third and fourth iterates of the original triangle. This process repeated ad infinitum gives rise to the Sierpinski gasket.

The L-System representation of the process is:

variables : A B

constants : + −

start : A

rules : (A → B−A−B),(B → A+B+A)

angle : 60°

Continuing with our earlier assumption, suppose the initial triangle had an area

A0= 1;

Now in the first iteration we remove one of the four equal areas in that triangle and keeping the other three that remain.

Therefore the total area of the first iterate will be equal to

A1 = (3/4) x 1;

Similarly in the second iteration, we repeat the process as I have already noted above. Thus,

A2 = (3/4)(3/4) x 1;

For n iterations the area will be given by:

An = (3/4)n x 1;

Now if n is arbitrarily large then the area, it would follow will be ZERO.

Now finding out the perimeter is a similar exercise. The length of the boundary of the nth iterate of the original triangle is the total length of the boundaries of all the shaded small triangles in the nth iterate. It can be shown that this gets arbitrarily large as n gets arbitrarily large. Therefore we conclude that a Sierpinski gasket has infinite perimeter!

And that it has zero area inside and infinite perimeter.

This is against the Koch Snowflake, which is a figure having a finite area inside an infinite perimeter. This goes against the way we think and according to geometric intuition that we have. But it actually is characteristic of many shpaes in nature. For example (i really like this example :D ) if we take all the arteries, veins and capillaries in the human body, then they occupy a relative small fraction of the body. Yet if we were able to lay them out end to end, we would find that the total length would come to over 60000 kilometres. This example i am quoting from an article i had copied years ago.

Related Article:

Lindenmayer systems and Fractals

Onionesque Reality Home >>

## Modeling Fractals

This post is as a follow up to the previous post.

I have greatly been interested in fractals and have played around with models and mathematics of the same. With a friend of mine i have developed many NetLogo models as well.

The NetLogo website offers very decent models of fractals. These can be run for the curves.

The models allows the user to change parameters and see the change. NetLogo models such systems amazingly!

You could play around with these models to get a good understanding of L-system generation. These models can also be extended if they really fire you up!

Related Post: NetLogo Version 4.0.2 released

## The Turing Test – I

Via BBC-h2g2:

A test for artificial intelligence suggested by the mathematician and computer scientist Alan Turing. The gist of it is that a computer can be considered intelligent when it can hold a sustained conversation with a computer scientist without him being able to distinguish that he is talking with a computer rather than a human being.

Some critics suggest this is unreasonably difficult since most human beings are incapable of holding a sustained conversation with a computer scientist.

After a moments thought they usually add that most computer scientists aren’t capable of distinguishing humans from computers anyway.

:D

xkcd makes it even better with a cartoon on it. I love xkcd ! ;)