Feeds:
Posts
Comments

Well again parts of my notes (modified suitably to be blog posts) for a discussion session!

This post would be the first in a series of four posts. The objective of each post would be as follows:

1. This post would introduce Learning Theory, the bias-variance trade-off and sum up the need of learning theory.

2. This would discuss two simple lemmas : The Union Bound and the Hoeffding inequality and then use them to get to some very deep results in learning theory. It would also introduce and discuss Empirical Risk Minimization.

3. Continuing from the previous discussion this post would derive results on uniform convergence, tie the discussions into a theorem. From this theorem we would have made formal the bias-variance trade-off discussed in the first post.

4. Will talk about VC Dimension and the VC bound.

Basically all the results are derived using two very simple lemmas, hence the name of these posts.

______

Introduction:

Learning theory helps give a researcher applying machine learning algorithms  some rules of the thumb that tell how to best apply the algorithms that he/she has learnt.

Dr Andrew Ng likens knowing machine learning algorithms to a carpenter acquiring a set of tools. However the difference between a good carpenter and not so good one is the skill in using those tools. In choosing which one to use and how. In the same way Learning Theory gives a “machine-learnist” some crude intuitions about how a ML algorithm would work and helps in applying them better.

A lot of people still think of learning theory as a method for getting papers published (I’d like to use that method, I need papers ;-), as it is considered abstruse by many and not of much practical value. A good refutation of this tendency can be seen here on John Langford’s fantastic web-log.

______

As put in a popular tutorial by Olivier Bousquet, the process of inductive learning can be summarized as:

1. Observe a phenomenon.

2. Construct a model of that phenomenon.

3. Make predictions using this model.

Dr Bousquet puts it very tersely that the above process can actually said to be the aim of ALL natural sciences. Machine learning aims to automate the process and learning theory tries to formalize it. I think the above gives a reasonable idea about what learning theory deals with.

Learning theory formalizes terms like generalization, over-fitting and under-fitting. This series of posts (read notes) aims to introduce these terms and then jump to a recap of some important error bounds in learning theory.

______

Training Error, Generalization Error and The Bias-Variance Tradeoff:

For simplicity let’s take something as simple as linear regression. And since I want this piece to be accessible, I assume no knowledge of linear regression either.

Linear Regression essentially models the relationship between one variable X and another variable Y such that the model itself depends linearly on the unknown parameters to be estimated from the data. Let’s have a look at what this means:

Suppose you have a habit of collecting weird datasets and you end up collecting up a dataset that gives the circumference of biceps of many men and the distance a javelin is thrown by each of them. And you want to predict for an unknown individual, given the circumference of his biceps how far can he throw the javelin.

javelin250px-Biceps_887

Ofcourse there would be a number of reasons that would affect the distance a javelin would go, such as skill (which is essentially non-quantitative?), height, the kid of footwear worn, run-up distance, state of health etc. These would be the some of the many features that would affect that end result (distance a javelin is thrown). What I essentially mean is that the circumference of the biceps isn’t a realistic feature to predict how far a javelin can be thrown. But let’s assume that there is only one feature and it can make reasonable predictions. This over-simplification is only made so that the process can be visualized in a graph.

Suppose you collect about 80 such examples (which you call the training examples) and plot your data as such:

untitled

Now the problem given to you is: Given you have the bicep-circumference measurement of an unknown individual, predict how far he can throw the javelin.

How would one do it?

What we would do is to fit in some curve in the above training set (the above plot). And when we have to make a prediction we simply plug in that value in our curve and find the corresponding value for the distance. Something illustrated below.

untitled

The curve can be represented in a number of ways. However, if the curve was to be represented linearly (that’s why it’s called linear regression) it could be written as :

h(x) = \theta_0 + \theta_1 x

Where h(x) is the hypothesis, \theta_0 and  \theta_1 are unknown parameters which are to be learnt from the data and x is the input feature. It is noteworthy that this is like the slope intercept form of the line.

In the above, for simplicity I considered only one feature, there could be many more. In the more general case:

h_\theta(x) = \theta_0 + \theta_1 x_1 + \dotsb + \theta_i x_i \cdots (1)

The \theta’s are called the parameters (to be learnt from the data) that will decide the nature of the curve.

We see that the equation involves features of the training examples (x’s), therefore using this, the task of the learning algorithm will be to decide the most optimum values of \theta_i using the training set. This can be easily done by something like Gradient Descent.

For any new example, we’d have the features x and parameters would already be known by running gradient descent using the training set. We simply have to plug in the value of x in equation (1) to get a prediction.

To sum up : Like I mentioned, we use the training set to fit in a optimal curve and then try to predict unseen inputs by simply plugging in its values to the “equation of the curve”.

Now, it goes without saying that we could fit in a “simple” model to the training set or a more “complex” model. A simple model would be linear say something like:

y=\theta_0 + \theta_1x

and a complex model could be something like this:

y=\theta_0 + \theta_1 x + \dotsb + \theta_5 x^5.

It’s to be noted that in the above the same feature x is used in different ways, the second model uses x to create more features such as x^2, x^3, and so on. Clearly the second representation is more complex than the first as it will exploit more patterns in the data (it has more parameters).

However this increase in complexity can lead to problems, in the same way if the model is too simple it can lead to problems. This is illustrated below:

untitleduntitled3

[Fig 1 (Left) and Fig 2 (Right)]

The figure on the left has a “simple model” fit into the training set. Clearly there are patterns in the data that the model would never take into account, no matter how big the training set goes. Paraphrasing this in more concrete terms, it’s clear that the relationship between x and y is not linear. So if we try to fit in a linear model to it, not matter how much we train it, there would always be some patterns in the data that the model would fail to subsume.

What this means is, what is learnt from the training set will not be generalised well to unknown examples (this is because, it might be that the unknown example comes from that part of the distribution that the model fails to account for and thus the prediction for it would be very inaccurate).

The figure on the right has a “complex” model fit into the same set, clearly the model fits the data very well. But again it is not a good predictor as it does not represent the general nature of the spread of the data but rather takes into account the idiosyncrasies of the same. This model would make very good predictions on the data from the training set itself, but it would not generalize well to unknown examples.

A more appropriate fit would be something like this :

untitled2Now we can move to a definition of the generalization error, The generalization error of a hypothesis is its expected error on examples that are not from the training set. For an example on understanding generalization refer to the part labeled “Van-Gogh Chagall and Pigeons” in this post.

The models shown in figures 1 and 2 have HIGH generalization errors. However each suffer from entirely different problems.

______

Bias : Like already mentioned : In the model shown in fig. 1, no matter how much the model is trained, There would always be some patterns in the data that the model would fail to capture. This is because the model has a high BIAS. Bias of a model is the expected generalization error even if we were to fit in a very large training-set.

Thus the linear model shown in figure 1 suffers from high bias and will underfit the data.

Variance : Apart from bias there is another component that has a bearing on the generalization error. That is the variance of the model fit into the training set.

This is shown in fig. 2. We see that even though that the model fits in very well in the training set, there is the risk that we are fitting patterns that are idiosyncratic to the training examples and may not represent the general pattern between x and y.

Since we might be fitting spurious patters and exaggerating minor fluctuations in the data, such a model would still give a high generalization error and will over-fit the data. In such a case we say that the model has a high variance.

The Trade-off : When deciding on a model to fit onto the training set, there is a trade-off between the bias and the variance. If either is high that would mean the generalizing ability of the model would be low (generalization error would be high). In other words, if the model is too simple i.e if it has too few parameters it would have a high bias and if the model is too complex it would have a high variance. While deciding on a model we have to strike a balance between the two.

A very famous example that illustrates this trade-off goes like this:

Fall Tree

[Suppose there is an exacting biologist who studies and classifies green trees in detail. He would be the example of an over-trained or over-fit model and would declare if he sees a tree with non-green leaves like above that it is not a tree at all]

Cucumber

[An under-trained or under-fit model would be like the above biologist's lazy brother, who on seeing a cucumber which is green declares that it is a tree]

Both of the above have poor generalization. We wish to select a model that has an appropriate trade-off between the two.

______

So why do we need Learning Theory?

Learning theory is an interesting subject in its own right. It, however also hones our intuitions on how to apply learning algorithms properly  giving us a set of rules of the thumb that guide us on how to apply learning algorithms well.

Learning theory can answer quite a few questions :

1. In the previous section there was a small discussion on bias and variance and the trade-off between the two. The discussion sounds logical, however there is no meaning to it unless it is formalized. Learning theory can formalize the bias variance trade-off. This helps as we can then make a choice on choosing the model with just the right bias and variance.

2. Learning Theory leads to model selection methods by which we can choose automatically what model would be appropriate for a certain training set.

3. In Machine Learning, models are fit on the training set. So what we essentially get is the training error. But what we really care about is the generalization ability of the model or the ability to give good predictions on unseen data.

Learning Theory relates the training error on the training set and the generalization error and it would tell us how doing well on the training set might help us get better generalization.

4. Learning Theory actually proves conditions in which the learning algorithms will actually work well. It proves bounds on the worst case performance of models giving us an idea when the algorithm would work properly and when it won’t.

The next post would answer some of the above questions.

______

Onionesque Reality Home >>

I am in the process of winding up taking a basic course on Bio-Informatics, it is offered as an elective subject for final year under-graduate Information Technology students.  I preferred taking this course as a visiting faculty on weekends as managing time in the week is hard (though i did take some classes on weekdays).

______

nbt1205-1499-F1[Gene Clustering : (a) shows clusters (b) uses hierarchical clustering (c) uses k-means (d)  SOM finds clusters which are arranged in grids. Source : Nature Biotechnology 23, 1499 - 1501 (2005) by Patrick D'haeseleer]

Why Bio-Informatics?

The course (out of the ones offered in Fall) I would have preferred taking the most would have been a course on AI. There is no course on Machine Learning or Pattern Recognition at the UG level here, and the course on AI comes closest as it has sufficient weight given to Neural Nets and Bayesian Learning.

The only subject that comes nearest to my choice as AI was not available, was Bio-Informatics as about 60 percent of the syllabus was Machine Learning, Data Mining and Pattern Recognition. And it being a basic course gave me the liberty to take these parts in much more detail as compared to the other parts. And that’s exactly why taking up Bio-Informatics even though it’s not directly my area was not a bad bargain!

______

The Joys of Teaching:

This is the first time that I have formally taken a complete course, I have taken work-shops and given talks quite a few times before. But never taken a complete course.

I have always enjoyed teaching. When I say I enjoy teaching, I don’t necessarily mean something academic. I like discussing ideas in general

If I try to put down why I enjoy teaching, there might be some reasons:

  • There is an obvious inherent joy in teaching that few activities have for me. When i say teaching here, like I said before I don’t just mean to talk about formal teaching, but rather the more general meaning of the term.
  • It’s said that there is no better way to learn than to teach. Actually that was the single largest motivation that prompted me to take that offer.
  • Teaching gives me a high! The time I get to discuss what I like (and teach), I forget things that might be pressing me at other times of the day. I tend to become a space-cadet when into teaching. It’s such a wonderful experience!
  • One more reason that i think i like teaching is this : I have a wide range of reading (or atleast am interested in) and I have noticed that the best way it gets connected and in most unexpected ways is in discussions. You don’t get people who would be interested in involved discussions very often, also being an introvert means the problem is further compounded. Teaching gives me a platform to engage in such discussions. Some of the best ideas that I have got, borrowing from a number of almost unrelated areas is while discussing/teaching. And this course gave me a number of ideas that I would do something about if I get the chance and the resources.
  • Teaching also gives you the limits of your own reading and can inspire you to plug the deficiencies in your knowledge.
  • Other than that, I take teaching or explaining things as a challenge. I enjoy it when I find out that I can explain pion exchanges to friends who have not seen a science book after grade 10. Teaching is a challenge well worth taking for a number of reasons!

From this specific course the most rewarding moment was when a couple of groups approached me after the conclusion of classes to help them a little with their projects. Since their projects are of moderate difficulty and from pattern recognition, I did take that up as a compliment for sure! Though I can not say I can “help” them,  I don’t like using that word, it sounds pretentious, I would definitely like to work with them on their projects and hopefully would learn something new about the area.

______

Course:

I wouldn’t be putting up my notes for the course, but the topics I covered included:

1. Introduction to Bio-Informatics, Historical Overview, Applications, Major Databases, Data Management, Analysis and Molecular Biology.

2. Sequence Visualization, structure visualization, user interface, animation verses simulation, general purpose technologies, statistical concepts, microarrays, imperfect data, quantitative randomness, data analysis, tool selection, statistics of alignment, clustering and classification, regression analysis.

3. Data Mining Methods & Technology overview, infrastructure, pattern recognition & discovery, machine learning methods, text mining & tools, dot matrix analysis, substitution metrics, dynamic programming, word methods, Bayesian methods, multiple sequence alignment, tools for pattern matching.

4. Introduction, working with FASTA, working with BLAST, filtering and capped BLAST, FASTA & BLAST algorithms & comparison.

Like I said earlier, my focus was on dynamic programming, clustering, regression (linear, locally weighted), Logistic regression, support vector machines, Neural Nets, an overview of Bayesian Learning. And then introduced all the other aspects as applications subsequently and covered the necessary theory then!

______

Resources:

All my notes for the course were hand-made and not on \LaTeX, so it would be impossible to put them up now (they were basically made from a number of books and the MIT-OCW).

H0wever I would update this space soon enough linking to all the resources I would recommend.

______

I am looking forward to taking a course on Digital Image Processing and Labs the next semester, which begins December onwards (again as a visiting instructor)! Since Image Processing is closer to the area I am interested in deeply (Applied Machine Learning – Computer Vision), I am already very excited about the possibility!

______

Onionesque Reality Home >>

Sisters and Book

Painting is just another way of keeping a diary.

- Pablo Picasso.

One of the things that I have done right from my puerility, discontinuously sadly, is painting and sketching. One of my oldest hobbies and something that gives me immense peace.

This post finds it way here as :

1. It reminds me that I should spend less time wasting on communities on the internet  when I need to “pass” time as a diversion or something (whatever little I spend anyway) and use it to paint instead whenever the usual workload is a little slack.

2. I come across scores of extremely wonderful things on the internet every week . Why share this then? Oh because it (the painting “Sisters and Book “) moves me in a way that I’d rather not try and describe on a blog or for what reasons it does so. Especially a blog that’s not a personal one and is rather shifting focus towards Machine Learning gradually. :)

_______

2Sisters-and-a-book

[Sisters and Book :  By Imam Maleki]

Another beautiful painting, which almost looks like a photograph to me is this!

2Omens-of-Hafez

[Omens of Hafez - Imam Maleki]

Though I must admit I do not get attracted to realism much, I find Imam Maleki’s work extremely beautiful! Especially the way he paints ladies.

And that’s why this finds its way here. And point noted again to give more time to my beloved hobby.

_______

Quick Links:

1. Imam Maleki’s Home Page

2. Imam Maleki’s Painting Collection

_______

Onionesque Reality Home >>

Okay, a very quick post and so  I’d make it pointed.

1. A lot of work is on in the background and that’s making access to the internet limited, thus limiting writing. A lot of that work should sometime be up on the blog one by one. Though I can’t promise a time frame.

2. I had an occasion to meet a couple of my “blog’s readers” in the last week. And it was an amazing time discussing CV topics (as they happened to read some posts on CV). Will come back to this again.

______

Hate You:

EDIT: I know this saying is too cliched. But I have an extremely silly reason to put it up.

Just came across this Gaping Void cartoon and it’s just too true!

hate you jpeg 400That “something” could actually be anything! Wouldn’t write why I put it. :)

______

Markov Random Fields:

I hate to use my blog for this, but anyhow anybody who has some experience working with Markov random fields and would have the time to discuss some things?

If yes, I would be extremely grateful if you could write me an email at either :

onionesquereality[AT]yahoo[DOT]com

or

shubhendu_trivedi[AT]ieee[DOT]com

______

Coffee:

186-019coffee-posters

I have noticed over the past year that there are a number of regular readers (who have endured some poor and eclectic posts and sparse posting ;-) from Leuven, Belgium.

I should be there for a fortnight pretty soon. Going by my experience in the past week I think it would be an extremely interesting experience to catch up over coffee!

Please send me an email if you’d be interested, we could possibly work out something. Please write to:

onionesquereality[AT]yahoo[DOT]com

or

shubhendu_trivedi[AT]ieee[DOT]com

As a meaningless aside: I always stayed away from Tea and coffee till some years ago. I rediscovered coffee thanks to a very dear friend. So now I can say I am qualified to say “Let’s meet up for coffee”. I wasn’t earlier to be honest.  :)

_____

Onionesque Reality Home >>

I would try to get more systematic about my posts from now on. For every two non-technical posts I would keep two technical posts.

This post would also be the first in a series of posts that in which I intend to write about some Visual Illusions only.

Before getting into subject of this post, it would be helpful to have a quick recap of the background.

_____

The Blind Spot:

Consider a horizontal cross section of the human eye as shown below.

HorzontalSectionOfRightEye

As seen in the above, the innermost membrane is the Retina, and it lines the walls of the posterior portion of the eye. When the eye is focused, light from the focused object is imaged onto the Retina. It thus acts as a screen. Pattern vision is caused by the distribution of discrete light receptors called rods and cones over the retinal surface.

Each eye has about 6-7 million cones, located primarily in the central portion of the Retina and they are highly sensitive to color. Humans can resolve fine details with cones as each cone is connected to its own nerve end. The vision due to cones is called Photopic or bright-light vision.

The number of rods is about 75-150 million andare distributed throughout the retina. The amount of details that can be resolved by rods is lesser as several of them are connected to the same nerve unlike in the cones. Vision due to rods is simply to give an overall picture of the field of view. Objects that seen in bright day light appear as color-less forms in moonlight as only the rods are stimulated. This type of vision is called Scotopic or dim-light vision.

As seen in the figure there is a portion on the retina which has no receptors (rods or cones), thus will not cause any sensation. This is called the blind spot.

Now because of the blind spot a certain field of vision is not perceived. We however do not notice it as the brain fills it with details from the surroundings or using information from the other eye.

The blind spots in both the eyes are arranged symmetrically so that the loss in field of vision in one eye will compensate for the other. This is shown by the figure below.

illustration-blind-spot[Image Source]

If the brain would not fill the lost field of vision with surrounding details and information from the other eye, then the blind spot would appear something like the black dot on the image below.

Blind Spot view

[Image Source]

_____

Now that means, if you close one eye then you can indeed detect the presence of the blind spot as the brain would not have sufficient information about the lost field of vision (though it would be good enough for us to not notice it normally). The presence of the blind spot can be demonstrated by the simple figure below.

Demo of Blind Spot

Click on the above image to enlarge

Now enlarge the above image and close your right eye and focus your left eye on the X only. Don’t try to look at the O on the left. You’d just notice it at the periphery. The object of interest should only be X.

Now move towards the screen, at a certain point you will not see O in the periphery. If you go ahead of this point or behind it you’ll see O again, this specific point (a range actually) where you can not see O indicates the presence of the blind spot.

_____

The Vanishing Head Illusion:

This leads to some interesting illusions, one of the most interesting being the so called vanishing head illusion.

As in the above figure. If the O is replaced by a head, the person would appear headless if the head falls on the blind spot.

Check the video below in full screen for best results.

View in Full Screen

We notice that Richard Wiseman on the left indeed appears headless and that field of view is filled up by the orange background when the blind spot falls.  Then he does something even more interesting. He uses a black bar and moves it up and down in front of his face.  Now instead of seeing the bar as discontinuous, the brain manages to show the bar as a continuous entity!

_____

Onionesque Reality Home >>

A week ago I observed that there was a wonderful new documentary on you-tube, put-up by none other than author and documentary film-maker Christopher J. Sykes. This is what this post is basically about. I’ll digress for a moment and come back to it in a while.

With the exception of the Feynman Lectures in Physics Volume III, Six not so easy pieces (both of which I don’t intend to read in the conceivable future) there is no book with which Feynman was involved (he never wrote himself) that I have not had the opportunity to read. The last that I read was “Don’t You Have Time to Think“, a collection of letters by Feynman.

Don't You Have Time To Think

A number of people including many of Feynman’s friends were surprised to learn that Feynman wrote letters and so many of them. and especially the type of letters he wrote.  Letters give a very different picture of a man than a conventional biography does. And these reveal Feynman to be a genius with a human touch. I have covered points in an earlier post which now seems to me to be overtly enthusiastic. ;-)

Sean Caroll aptly writes that Feynman worship is often overdone, I think he is right. Let me make my own opinion on the matter.

I don’t consider Feynman god or anywhere close to that (but definitely one of my idols and one man I admire greatly), I actually consider him to be very human and some one who was unashamed of admitting to his weaknesses and who had a certain love for life that’s rare. I only am attracted to Feynman for one reason : People like Feynman are a breath of fresh air in the bunch of supercilious pseudo-intellectual snobs that are abound in academia and industry. A breath of fresh air especially for the lesser mortals like me. That’s why I like that man. Why is he so famous? I have tried writing on it before. And I won’t do so anymore.

I’d like to cite two quotes that would give my point of view on the celebrity-fication of scientists, in this case Feynman. Dave Brooks writes in the Telegraph in an article titled “Physicist still leaves some all shook up” February 5, 2003:

Feynman is the person every geek would want to be: very smart, honored  by the establishment even as he won’t play by his rules, admired by people of both sexes, arrogant without being envied and humble without being pitied. In other words, he’s young Elvis, with the Earth  shaking talent transferred from larynx to brain cells and enough sense to have avoided the fat Las Vegas phase. Is such celebrity-fication of scientists good? I think so, even if people do have a tendency to go overboard. Anything that gets us thinking about science is something to be admired, whether it comes in the form of an algorithm or an anecdote.

I remember reading an essay by the legendary Freeman Dyson that said:

Science too needs its share of super heroes to bring in new talent.

These rest my case I suppose.

_____

The only other book of Feynman that I have not read and that I have wanted to read for a LONG time is Tuva or Bust! Richard Feyman’s Last Journey. Unfortunately I have never been able to find it.

Tuva or Bust! Richard Feyman's Last Journey

There was a BBC Horizon documentary on the same. And thankfully Christopher J. Sykes has uploaded that documentary on you-tube.

This is a rare documentary and was the last in which Feynman appeared. It was infact shot just some days before his death. This documents the obsession of Richard Feynman and his friend Ralph Leighton with visiting an obscure place in central Asia called Tannu Tuva. During a discussion on geography and in a teasing mood Feynman was reminded of a long forgotten memory and quipped at Leighton, “Whatever happened to Tannu Tuva”. Leighton thought it was a joke and confidently said that there was no such country at all. After some searching they found out that Tannu Tuva was once a country and now a soviet satellite. It’s capital was “Kyzyl”, the name was so interesting to Feynman that he though he just had to go to this place. The book and the documentary covers Feynman’s and Leighton’s adventure of scheming of getting to go to Tannu Tuva and to get around Soviet bureaucracy. It is an extremely entertaining film to say the least. The end for it is a little sad though. Feynman passed away three days before he got a letter from the Soviets about permission to visit Tannu Tuva and Leighton appears to be on the verge of tears.

The introduction to the documentary reads as:

The story of physicist Richard Feynman’s fascination with the remote Asian country of Tannu Tuva, and his efforts to go there with his great friend and drumming partner Ralph Leighton (co-author of the classic ‘Surely You’re Joking, Mr Feynman’). Feynman was dying of cancer when this was filmed, and died a few weeks after the filming. Originally shown in the BBC TV science series ‘Horizon’ in 1987, and also shown in the USA on PBS ‘Nova’ under the title ‘Last Journey of a Genius’

Find the five parts to the documentary below:

“I’m an explorer okay? I get curious about everything and I want to investigate all kinds of stuff”

Part 1

tatu1Click on the above image to watch

____

Part 2

tatu2Click on the above image to watch

____

Part 3

tatu3-2Click on the above image to watch

____

Part 4

tatu4Click on the above image to watch

____

Part 5

tatu5-2Click on the above image to watch

____

After I got done with the documentary did I realize that the PBS version of the above documentary was available on google video for quite some time.

Find the video here.

_____

Michelle Feynman

Michelle Feynman

As an aside :  though Feynman could not manage to go to Tuva in his lifetime. His daughter Michelle did visit Tuva last month!

_____

One of the things that has me in awe after the documentary over the last week is Tuvan throat singing. It is one of the most remarkable things that I have seen in the past month or two. I am strongly attracted to Tibetan chants too, but these are very different and fascinating. The remarkable thing about them being that the singer can produce two pitches as if being sung by two separate singers. Have a look!

_____

Project Tuva : Character of Physical Law Lectures

On the same day I came across 7 lectures which were given by Feynman at Cornell in 1964 and were put into a book later by the name “The Character of Physical Law”.  These have been made freely available by Microsoft Research. Though some of these lectures have already been on youtube for a while, the ones that were not needless to say were a joy to watch. I had linked to the lectures on Gravitation and Arrow of Time previously.

Project TuvaClick on the above image to be directed to the lectures

I came to know of these lectures on Prof Terence Tao’s page, who I find very inspiring too!

_____

Quick Links:

1. Christopher J. Sykes’ Youtube channel.

2. Tuva or Bust

3. Project Tuva at Microsoft Research

_____

Onionesque Reality Home >>

I picked up these images at Wired two days ago and just could not fit in the time to put them up earlier.

There is a remarkable quote by Einstein -

“Two things are infinite: the universe and human stupidity; and I’m not sure about the the universe.”

It isn’t definite to me if I liked this quotation earlier. But I am NOW wholly convinced that I love it. And this new found unequivocalness for it is due to the following:

As a kid, I used to have a large collection of encyclopedias. I remember reading about the Aral Sea in the picture atlas, and that it mentioned that there was increasing salination of the sea water and that it would disappear in some decades.

Over the years that time, we were fed with doomsday scenarios all the while. Like all the coastal cities would be soon under sea due to rising ocean levels, and that the Himalayas would soon be ice free etc etc. Over a period of time you get fed up with such idle talk and since you don’t see anyone giving convincing answers, you tend to believe that nothing like that is true. Secondly, the eternal optimist that I am, I just probably wished that what that encyclopedia said about the Aral was some “minor” problem.

I last read about the problem many years ago and after that never came across anything on it. And just a couple of days back was shocked by these images. I have only one word for them : Tragic!

The images are from 1973, 1987, 1999, 2006 and 2009. The two recent images were released by the European Space Agency, the earlier ones were taken by the United States Geological Survey.

Aral Sea - 1973

Aral Sea - 1973

Aral Sea - 1987

Aral Sea - 1987

Aral Sea - 1999

Aral Sea - 1999

Aral Sea - 2006

Aral Sea - 2006

Aral Sea - 2009

Aral Sea - 2009

[Image(s) Source : Wired Science]

The South Aral Sea, the remnant of the original lake that you can see to your left on the above image is also expected to vanish by 2020, thankfully the North Aral sea (the part on the right) has been saved due to a world bank funded dam project.

The Aral sea, once the world’s fourth largest lake at roughly around 68,000 sq kms is now just about one-tenth that size. The trouble started when it was decided by the Soviets in 1918 that the two rivers that drained into the Aral – The Amu Darya and Syr Darya would be largely diverted to the deserts to develop them into cotton growing lands. The Soviet plan worked and cotton became one of the most important exports from that area. By the 1960s massive amount of water was being diverted and the sea began to shrink steadily. And how that happened is spoken out loud by the pictures.

The death of the Aral is extremely sad. It’s death has left it’s once thriving fishing industry destroyed, the diverting of the rivers has mostly reduced the two rivers to a shadow of their former selves. The Aral served as a climate moderator in the largely arid lands there, it’s death might herald major environmental catastrophe in the region.

This is a prime example of what human stupidity could lead to and leaves me short of words to describe my anguish at the same.

It has a number of things to say:

Ignoring warnings which have clear proof is just plain stupidity. There is ample proof for example of climate change and its bad impact. For example, I have been visiting the Himalayas once every two years since 1991. And the change there is apparent, as compared to the 80s the glaciers that make up the Ganges have shrunk by several kilometers. I don’t know what the solutions are, nor am I comparing the Aral problem with it. I understand that the Aral was a different kind of a problem. Different because it was known to the Soviets that the lake would dry up from the start. Climate change can not be compared to it as we do not yet fully understand a number of things about it, so how effective the correctives would be is debatable. It would be for our good if that debate is settled soon with good and incisive scientific evidence.

It also is a comment on how totalitarian regimes can be dangerous. In such regimes, since a decision taken can not be opposed, such a decision could either lead to major dividends/progress as it would be implemented very rapidly or major catastrophe as was in the above case.  Soviet officials were aware that the Aral would sooner or later evaporate. In 1964 Aleksandr Asarin noted that :

“It was part of the five-year plans, approved by the council of ministers and the Politburo. Nobody on a lower level would dare to say a word contradicting those plans, even if it was the fate of the Aral Sea.”

Ofcourse he was right, there is rarely any way to convince or reason with or oppose supercilious totalitarian regimes even if their decisions are clearly suicidal. I am tempted to make a political comment on two present day countries (one a totalitarian state and one a liberal democracy) here, but would avoid the temptation.

Anyhow, the images above disturbed me enough to lose sleep.

_____

Onionesque Reality Home >>

http://www.wired.com/wiredscience/2009/07/aralsea/

Here are a number of interesting courses, two of which I am looking at for the past two weeks and that i would hopefully finish by the end of August-September.

Introduction to Neural Networks (MIT):

These days, amongst the other things that I have at hand including a project on content based image retrieval. I have been making it a point to look at a MIT course on Neural Networks. And needless to say, I am getting to learn loads.

neurons1

I would like to emphasize that though I have implemented a signature verification system using Neural Nets, I am by no means good with them. I can be classified a beginner. The tool that I am more comfortable with are Support Vector Machines.

I have been wanting to know more about them for some years now, but I never really got the time or you can say the opportunity. Now that I can invest some time, I am glad I came across this course. So far I have been able to look at 7 lectures and I should say that I am MORE than very happy with the course. I think it is very detailed and extremely well suited for the beginner as well as the expert.

The instructor is H. Sebastian Seung who is the professor of computational neuroscience at the MIT.

The course has 25 lectures each one packed with a great amount of information. Meaning, the lectures might work slow for those who are not very familiar with this stuff.

The video lectures can be accessed over here. I must admit that i am a little disappointed that these lectures are not available on you-tube. That’s because the downloads are rather large in size. But I found them worth it any way.

The lectures cover the following:

Lecture 1: Classical neurodynamics
Lecture 2: Linear threshold neuron
Lecture 3: Multilayer perceptrons
Lecture 4: Convolutional networks and vision
Lecture 5: Amplification and attenuation
Lecture 6: Lateral inhibition in the retina
Lecture 7: Linear recurrent networks
Lecture 8: Nonlinear global inhibition
Lecture 9: Permitted and forbidden sets
Lecture 10: Lateral excitation and inhibition
Lecture 11: Objectives and optimization
Lecture 12: Excitatory-inhibitory networks
Lecture 13: Associative memory I
Lecture 14: Associative memory II
Lecture 15: Vector quantization and competitive learning
Lecture 16: Principal component analysis
Lecture 17: Models of neural development
Lecture 18: Independent component analysis
Lecture 19: Nonnegative matrix factorization. Delta rule.
Lecture 20: Backpropagation I
Lecture 21: Backpropagation II
Lecture 22: Contrastive Hebbian learning
Lecture 23: Reinforcement Learning I
Lecture 24: Reinforcement Learning II
Lecture 25: Review session

The good thing is that I have formally studied most of the stuff after lecture 13 , but going by the quality of lectures so far (first 7), I would not mind seeing them again.

Quick Links:

Course Home Page.

Course Video Lectures.

Prof H. Sebastian Seung’s Homepage.

_____

Visualization:

This is a Harvard course. I don’t know when I’ll get the time to have a look at this course, but it sure looks extremely interesting. And I am sure a number of people would be interested in having a look at it. It looks like a course that be covered up pretty quickly actually.tornado

[Image Source]

The course description says the following:

The amount and complexity of information produced in science, engineering, business, and everyday human activity is increasing at staggering rates. The goal of this course is to expose you to visual representation methods and techniques that increase the understanding of complex data. Good visualizations not only present a visual interpretation of data, but do so by improving comprehension, communication, and decision making.

In this course you will learn how the human visual system processes and perceives images, good design practices for visualization, tools for visualization of data from a variety of fields, collecting data from web sites with Python, and programming of interactive visualization applications using Processing.

The topics covered are:

  • Data and Image Models
  • Visual Perception & Cognitive Principles
  • Color Encoding
  • Design Principles of Effective Visualizations
  • Interaction
  • Graphs & Charts
  • Trees and Networks
  • Maps & Google Earth
  • Higher-dimensional Data
  • Unstructured Text and Document Collections
  • Images and Video
  • Scientific Visualization
  • Medical Visualization
  • Social Visualization
  • Visualization & The Arts

Quick Links:

Course Home Page.

Course Syllabus.

Lectures, Slides and other materials.

Video Lectures

_____

Advanced AI Techniques:

This is one course that I would  be looking at some parts of after I have covered the course on Neural Nets.  I am yet to glance at the first lecture or the materials, so i can not say how they would be like. But I sure am expecting a lot from them going by the topics they are covering.

The topics covered in a broad sense are:

  • Bayesian Networks
  • Statistical NLP
  • Reinforcement Learning
  • Bayes Filtering
  • Distributed AI and Multi-Agent systems
  • An Introduction to Game Theory

Quick Link:

Course Home.

_____

Astrophysical Chemistry:

I don’t know if I would be able to squeeze in time for these. But because of my amateurish interest in chemistry (If I were not an electrical engineer, I would have been into Chemistry), and because I have very high regard for Dr Harry Kroto (who is delivering them) I would try and make it a point to have a look at them. I think I’ll skip gym for some days to have a look at them. ;-)

kroto2006

[Nobel Laureate Harry Kroto with a Bucky-Ball model - Image Source : richarddawkins.net]

Quick Links:

Dr Harold Kroto’s Homepage.

Astrophysical Chemistry Lectures

_____

Onionesque Reality Home >>

In the past month or so I have been looking at a series of lectures on Data Mining that I had long bookmarked. I’ve had a look at the lectures twice and I found them extremely useful, hence I thought it was not a bad idea to share them here, though I am aware that they are pretty old and rather well circulated.

These lectures delivered by Professor David Mease as Google Tech Talks/Stanford Stat202 course lectures, work equally well for beginners as for experts who need to brush up with basic ideas. The course uses R extensively.

data mining icon11Statistical Aspects of Data Mining

Links:

Course Video Lectures.

Course website.

Lecture Slides.

_____

I’d end with some Dilbert strips on Data-Mining that I have liked in the past!

Data Mining

_____

DilbertMiningData2

_____

DilbertMiningData3

_____

Onionesque Reality Home >>

I know this post is no rocket science and might just appear to be too silly! :)

5 years back i joined a social networking website (orkut), my first. It has a unique feature w.r.t such sites – it displays the number of visits to your profile on each day. Some related features of the web-site are as follows:

1. When you sign into your account, you appear on the home-page of your friends’ profiles (something like “recently online”). As other people sign-in after you did, you’ll gradually stop appearing on the home page of your friends’. This is because, the home page can only show 9 people at a time, that in turn means that if 9 people sign in after you, you would cease to be visible on the home page and would only be accessible on the friends list – that generally people don’t look at unless they are looking for someone specifically.

2. If anyone visits your page or refreshes it. It will be counted as one hit.

Now, it is obvious that if you sign-in very often, you are are more likely to be seen on the home-page of your friends’ pages who are then more likely to click through to your page (I don’t have any research to support this, but see it as common sense and as per experience. If you see someone’s profile on your page you would be more likely to visit that profile casually rather than search for that profile and visit it when it is out of sight from the home page. The latter you would do only if you have to communicate with the concerned person, or if you have some work, OR if you have to spy on the person under question ;-). Thus, in conclusion – when you sign-in regularly you are more likely to get more hits on your profile.

Now there are tricks to avoid appearing online on such websites. What they’ll do is that even when you sign-in you would not appear online on the home pages of your friends. Since you do not appear online on the page of your friends your profile is also less likely to be visited by those who did after seeing you online. In short your profile would be visited by such people only:

1. People who wanted to talk to you for something.

2. People who randomly searched for your name/somebody else sharing your name (this gets removed if you don’t keep a name) and landed up on your profile.

3. People who saw your posts/messages on some community or group and then got curious and visited your page.

4. Somebody searched for his/her favorite cult movie and you have that movie on your profile and thus it shows on the search results and that somebody then checks out your page (extends to artists, music etc too).

5. Somebody randomly remembers you and checks your page out to see what’s up with you.

Thus we can say that if certain conditions are satisfied, the number of hits to your page would be a random number, at least approximately.

I generally satisfied the condition of not appearing online, and I noticed over the years that other than the occasional spike, the number of hits was in a way distributed around a central value. I however never paid any further attention.

Some weeks ago, while writing somewhere.,  I thought it was time to try and model the same for a blog or a website and see for myself if that number could indeed be considered as random. Please note that if that number would be random then the distribution of page visits over a period of time would be a Gaussian curve (I’ll come back at the end of the post to this for those who wouldn’t be sure).

Now it is difficult to satisfy those conditions that I mentioned for social networking website for some blog or website on the web. I looked for the following and made the following assumptions (please question their wisdom in case you don’t agree and give new suggestions):

1. Suppose you start writing a blog and it starts off rather well. You are enthusiastic and advertise your page and ask people to pay a visit. Such hits can’t be considered random hits. The total number of visits to the page would be non-random plus random hits (from search engines, random visitors etc).

2. You are active on your blog in a big way for the first year say. And suppose the feed/email subscribers keep visiting your blog frequently as and when they get notified of a new post by you. This number too wouldn’t be a random one as the number of hits would be basically a function which has a dependent variable in the number and frequency of posts as well. In short the more you post the more page visits you are likely to get.

3. After the one year you decide to quit your page. For some time the subscribers would keep visiting your page. And since you have stopped blogging as such, you would stop advertising too, you would stop giving its link to people/friends and asking them to pay a visit.

4. After a sufficient period of time, say another year. The “excitement” about the blog has died down and there are no new posts at all. The number of hits that you obtain can be:

(a) Random hits from people searching randomly some stuff on search engines.

(b) Randomly people (mostly friends, former stalkers etc ;-) think of paying your blog a visit just hoping there might be something new.

(c) People keep looking for some tutorials (or similar post) on your page. I have noticed that no matter how old a tutorial on your page gets the old crowd is mostly replaced by new people and the overall number of visits to that page remains roughly around a mean value.

I believe that this sum total (unless bot attacks, or similar events which would result in spikes in the number of visits for a day occur) would roughly be a random number. And also that this above scenario for a website is equivalent to that of a profile I mentioned earlier.

I actually thought it was pretty straight forward. I asked some people about what their opinion on it was. And it appeared to me that either I lacked the communication skills to convey what i meant or maybe it was not so straight forward.

I decided to take it up. I collected the webstats for two websites.

I would like to thank Dr Jonathan Yedidia of the MERL for providing me with the stats of his website over the past three-four months or so. This website has been inactive for over a year and satisfies the other conditions that I spoke about, so it was an ideal candidate.

One more observation was this : The number of hits on weekends is visibly less than on weekdays. So it is not a good idea to use both together. It would be a better idea to use two classes:

1. Only weekdays

2. Only weekends or other holidays.

That is, it is a good idea to only models weekends, or only weekdays. Both together would not be a good idea as they seem to have different distributions.

_____

Let’s only consider the weekdays class. Like I mentioned I collected data for some months for Dr Jonathan Yedidia’s website. And the data plots actually turned out to be Gaussian. That’s interesting.

The staircase plot for the number of visitors to the website is given below: The X Axis represents the days and the Y Axis represents the number of hits.

Stairs[Staircase Plot for the number of visitors]

I plotted a simple historgram of this data for 40 bins and for 80 bins and the plot is what I expected it to be! Roughly normal. There is an outlier though (a day when number of hits was 265, which I believe was due to a bot attack or something similar).

Histogram-40[Histogram Plot for the data with 40 bins]

Just for the sake of visual convenience let’s consider the same data for 80 bins.

Histogram-80[Histogram Plot for the data with 40 bins]

The normal fit for the above plots (both for 40 and 80 bins) are given below:

Normal-Fit[Normal fit on the data with 40 bins]

Normal-Fit 2[Normal fit on the data with 80 bins]

The estimated values for the mean and the standard deviation are as follows:

\sigma=97.5079

\mu=28.1917

Meaning: The average number of hits on each day is about 97 and and on any given day the number of hits would most likely lie in the bracket 97.5 +/- 28.

_____

I found the above confirmation of what i had thought somewhat interesting. This does confirm that when certain things that I mentioned are taken care of, then the number of hits on a particular day can be considered to be a random number. I now plan to collect data for a longer period of about one year and repeat the same for four websites (2 that satisfy those assumptions and 2 that do not, these two would be the control).

The normal distribution never ceases to amaze me. It is one of those things that signify an underlying order in chaos. Which is fundamental. You take a set of random people, take measurements of their foreheads, waists, heights and it would lie along a normal distribution. You take the various shots taken by an archer towards an aim and the distribution of arrows about the center is along a normal distribution.

15200156

Infact this can be used to detect irregularities in data. Such patterns of randomness are pretty reliable to make a guess if there is any foul play with the data. For example if you were a recruiter in the the military and had data on the heights of thousands of men. If the data does not lie along a normal distribution would indicate some fudging or faking of heights by individuals. Such statistical analysis has given to a new area called Forensic Economics, which has some extremely enjoyable aspects to it such as Benford’s law, which too deals with checking the statistical fingerprint that data leaves to detect foul play.

Such interesting patterns underlying the chaos of social data led to many interesting philosophical questions in the 19th century. Many of which were simply whimsical, however exploration of which provided much progress in understanding randomness.

Older Posts »