Feeds:
Posts
Comments

I have been involved in a major project on contrast enhancement of Magnetic Resonance Images by using Independent Component Analysis (ICA) and Support Vector Machines (SVM) for the past couple of  months. It is an extremely exciting project and also something new for me, as I have worked on bio-medical images just once before. In the past, I have used ICA and SVM in face recognition/authentication, however this application is quite novel.

This post intends to introduce the problem, discuss a motivating example, some methods, expected work and some problems.

__________

A Simple Introduction and Motivating Example:

The simplest motivating example for this problem is the famous cocktail party problem:

You are at a cocktail party, and there are about 12 people present with each talking simultaneously. Add to that a music source. So that makes it 13.

Suppose you want to follow what each person was saying later and for doing so you place a number of tape recorders at different locations in the room (let’s not worry about the number of recorders right now). When you hear them later, the sounds would hardly be understandable as they would be mixed up.

Now you define an engineering problem : that using these recordings (which are basically mixtures), separate out the different sources with as little distortion as possible. In a real time cocktail party, the brain shows[1][2][3] a remarkable ability to follow one conversation. However such a problem has proved to be quite difficult in signal processing. Let’s just illustrate the cocktail party problem in a cartoon below :

The Cocktail Party Problem

Please listen to a demo of the cocktail party problem at the HUT ICA project page.

__________

The Logic Behind Constructing MR Images in Simple Terms:

Now, keeping the previous brief discussion in mind. Let’s introduce in simple words how MRI works. This is just a simplification to make the idea clearer, and not really how MRI works.  Discussing MRI in detail would divert the focus of the post. To look at how MRI works follow these highly recommended tutorials[4][5][6]:

Suppose your body is placed in a Magnetic Field (let’s not worry about specifics yet). Consider two contiguous tissues in your body – X and Y. When subject to a magnetic field, the particles (protons) in the tissues would get aligned according to the field. The amount of magnetization would depend on the tissue type. Now suppose we want to measure how much a tissue gets magnetized. One way to think about it is like this : First apply the magnetic field, after the application the particles would get excited. Once the field is removed, these particles would tend to relax to their ground state. By being able to measure the time it takes for the particles to return, we would get some measure of the magnetization of the tissue(s). This is because, the greater the time for relaxation, greater the magnetization.

An image is basically a measure of the energy distribution. Now suppose we have the measurements for tissues X and Y, and since they were of a different nature (composition, density of protons etc), their response to the field would have been different. Thus we would get some contrast between them and thus would get an image.

In very simplistic terms, this is how MRI scans are obtained. Though as mentioned above, please follow [4][5][6] for detailed tutorials on MRI.

__________

MRI scans of the Brain and the Cocktail Party Problem :

Now consider the above discussion in context of taking a MRI scan of the brain. The brain has a number of constituents. Some being : Gray Matter, White Matter, Cerebrospinal Fluid (CSF) Fat, Muscle/Skin, Glial Matter etc. Now since each is unique, they would exhibit unique characteristics under a magnetic field. However, while taking a scan, we get one MRI image of the entire brain.

These scans can be considered as an equivalent to the mixtures of the cocktail party example. If we apply blind source separation to these, we should be able to separate out the various constituents such as gray matter, white matter, CSF etc. These images of the independent sources can be used for better diagnosis. This would be something like this :

If suppose the Simulated MR scans (from the McGill Simulated brain Database) were as follows:

Simulated MR Scans

The “ground truth” images for these scans would be as follows :

Ground Truth Images of Different Brain Tissue Substances

__________

Restatement of the Broad Research Problem and Use of ICA and SVM:

Magnetic Resonance Imaging is superior to Computerised Tomography for brain imaging at least, for the reason that it can give much better soft tissue contrast (because even small changes in the proton density and composition in the tissue are well represented).

Like for most techniques, improvements to scans obtained by MRI are much desired to improve diagnosis. Blind source separation has been used to separate physiologically different components from EEG[7]/MEG[8] data (similar to the cocktail party problem), financial data[9] and even in fMRI[10][11]. But it has not received much attention for MRI. Nakai et al[12] used Independent Component Analysis for the purpose of separating physiologically independent components from MRI scans. They took MR images of 10 normal subjects, 3 subjects with brain tumour and 1 subject with multiple sclerosis and performed ICA on the data. They reported success in improving contrast for gray and white matter, which was beneficial for the diagnosis of brain tumour. The demylination in Multiple sclerosis cases was also enhanced in the images. They suggested that ICA could potentially separate out all the tissues which had different relaxation characteristics (different sources of the cocktail party example). This approach thus shows much promise.

In more technical terms : Consider a set of MR frames as a single multispectral image. Where each band is taken during a particular pulse sequence (will be discussed below). Then use ICA on the data to separate out the physiologically independent components. A classifier such as the SVM can improve the contrast further of the separated independent components.

However, using ICA for MRI has been tricky, something I would discuss towards the end of this post and also in future posts.

Before doing so, I intend to touch up on the basics for the sake of completeness.

__________

Magnetic Resonance Imaging:

I had been thinking of writing a detailed tutorial on MRI, mostly because it requires some basic physics. However I don’t think it is required. I would recommend [4][5][6] for a study of the same in sufficient depth. I have recently taken tutorials on MRI, and would be willing to write for the blog if there are requests.

__________

An Introduction to Independent Component Analysis:

Independent Component Analysis was developed initially to solve problems such as the cocktail party problem discussed above.

Let’s formalize a problem like the cocktail party example. For simplicity let us assume that there are only two sources and two mixtures (obtained by keeping two recorders at different locations in the party).

Let’s represent these two mixtures as x_1 and x_2, and let s_1 and s_2 be the two sources that were mixed. Since we are assuming that the two microphones were kept at different locations, the mixtures x_1 and x_2 would be different.

We could write this as:

x_1 = a_{11}s_1 + a_{12}s_2 \quad \cdots \quad (1)

x_2 = a_{21}s_1 + a_{22}s_2 \quad \cdots \quad (2)

The coefficients a_{11}, a_{12}, a_{21}, a_{22} are basically some parameters that depend on the distance of the respective source from the microphones.

Let’s define our problem as : Using only the mixtures x_i estimate the signal sources s_i. It is notable that you do not have any knowledge of the parameters a_{ij}.

This could be illustrated by this :

Consider three signals:

Suppose we have five mixtures obtained from these three signals.

Signals obtained by mixing source signals

If you only have the mixed signals available. And do not know how they were mixed (parameters a_{ij} not known). And from these mixed signals (x_{i}) you have to estimate the source signals (s_{i}). This problem is of considerable difficulty.

One approach would be : Use the statistical properties of the signals (s_i) to estimate the parameters (a_{ij}). It is surprising that it is enough to assume that s_1 and s_2 are statistically independent. This assumption might not be valid in many scenarios. But works well in most situations.

We could write the above system of linear equations in matrix form as :

x=As

where, A represents the mixing matrix, x and s represent the mixtures and the sources respectively.

The problem is to estimate s from x without knowing A. The assumption made is that the sources s are statistically independent.

How we go about solving this problem is exciting and an area of active research.  ICA was originally developed for solving such problems. Please follow [12][13][14] for discussions on mutual information, measures of non-gaussianity such as Kurtosis and Negentropy and the fastICA algorithm.

__________

Why can ICA be used in MRI?

One limitation that ICA faces is that it can not work if more than one signal sources have a  Gaussian distribution. This can be illustrated as follows:

Again consider our equation for just two sources :

\displaystyle \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} s_1 \\ s_2 \end{bmatrix}

Our problem was : We have to estimate s from x without any knowledge of A. We would first need to estimate the parameters A from x, assuming statistical independence of s. And then we could find s as :

s = Wx, where W=A^{-1} , or the inverse of the estimated mixing matrix A.

To understand how a solution would become impossible if both the sources had a Gaussian distribution, consider this :

Consider two independent components having the following uniform distributions:

P(s_i) = \begin{cases} \frac{1}{2 \sqrt{3}} & \text{if} \quad |s_i| \leq \sqrt{3} \\ 0 & \text{otherwise} \end{cases}

The joint density of the two sources would then be uniform on a square. This follows from the fact that the joint density would be the product of the two marginal densities.

The joint distribution for Si

[ Image Source : Reference [12][13] ]

Now if s_1 and s_2 were mixed by a mixing matrix A

A = \begin{bmatrix} 2 & 3 \\ 2 & 1 \end{bmatrix}

The mixtures obtained are x_1 and x_2. Now since the original sources had a joint distribution on a square, and they were transformed by using a mixing matrix, the joint distribution of the mixtures x_1 and x_2 will be a parallelogram. These mixtures are no longer independent.

Joint Distribution of the mixtures

[ Image Source : Reference [12][13] ]

Now consider the problem once again : We have to estimate the mixing matrix A from the mixtures x_i, and using this estimated A we have to estimate the sources s_i.

From the above joint distribution we have a way to estimate A. The edges of the parallelogram are in a direction given by the columns of A. This is an intuitive way of estimating the mixing matrix : obtain the joint distributions of the mixtures, estimate the columns of the mixing matrix by finding the directions of the edges of the parallelogram. This solution gives a good intuitive feel of a in-principle solution of the problem( however, it isn’t practical).

However, now instead of two independent sources having a uniform distribution consider two independent sources having a Gaussian distribution. The joint distribution would be :

Joint Distribution when both Independent sources are Gaussian

[ Image Source : Reference [12][13] ]

Now going by the above discussion, because of the nature of the above joint distribution, it is not possible to estimate the mixing matrix from it.

Thus ICA fails when one or more independent components have a a gaussian distribution.

Noise in MRI is non-gaussian[16], therefore ICA is suited for MRI.

__________

Problems in Using ICA for MRI Blind Source Separation:

The application of ICA for MRI faces a number of problems. I would discuss these in later blog posts. I would only discuss one major problem – the problem of Over-Complete ICA.

Over-Complete ICA in MRI:

The problem of over complete ICA occurs when there are lesser sensors (tape recorders from our above discussion) than sources. This problem can be understood by the following discussion. Suppose you have 3 mixtures x_1, x_2 and x_3 (imagine you have collected 3 tape recordings in a cocktail party of 6). Therefore you now have to estimate 6 sources from 3 mixtures.

Now the problem becomes something like this :

x_1 = a_{11}s_1 + a_{12}s_2 + a_{13}s_3 + a_{14}s_4 + a_{15}s_5 + a_{16}s_6

x_2 = a_{21}s_1 + a_{22}s_2 + a_{23}s_3 + a_{24}s_4 + a_{25}s_5 + a_{26}s_6

x_3 = a_{31}s_1 + a_{32}s_2 + a_{33}s_3 + a_{34}s_4 + a_{35}s_5 + a_{36}s_6

Assume for a second we can still estimate a_{ij}, still we can not find all the signal sources. As the number of linear equations is just three, while the number of unknowns is 6. This is a considerably harder problem and has been discussed by many groups such as [19][20][21].

Now dropping our assumption, the estimation of a_{ij} is also harder in such a case.

The Case in MRI:

The problem of over-complete ICA doesn’t arise when it comes to functional-MRI. However it is a problem when it comes to MRI[17].

In MRI, by varying the parameters used for imaging, the three kind of images that can be obtained are T1 weighted, T2 weighted and Proton Density images. Going by our discussion in the section on MRI above. These three can be treated as mixtures.

Therefore, we have 3 mixtures at our disposal.  However, as the ground truth images above show: The number of different tissues in the brain exceeds 9. Thus this becomes a considerably difficult problem : We have to estimate 9-10 independent components from just 3 mixtures.

I would discuss methods that can help do that in later blog posts.

If only three mixtures are used, 3 ICs can be estimated. Since the actual number of ICs exceeds 9. It is obvious that the each of 3 ICs have atleast 2 ICs mixed, which means that a certain tissue type is not enhanced as much as it could have been had there been one IC for it. This can be understood by looking at this example.

3 ICs obtained by Applying Fast-ICA on MR scans

[I used FastICA for obtaining these Independent Components ]

To get more ICs, in simple words, we need more mixtures. However we can obtain more mixtures from the existing mixtures itself by a process of Band-Expansion[18].

I would discuss this problem of OC-ICA and it’s possible solutions in later posts.

__________

To Conclude:

A basic idea related to application of ICA to MR scans was discussed. It is clear that even with just three ICs significant tissue contrast enhancement is achieved. Problems related to OC-ICA would be discussed in later posts one by one. I would also discuss quantifying the results obtained using the Tanimoto/Jaccard coefficient of similarity.

__________

References and Resources:

Cocktail Party Problem

[1] “Some Experiments on the Recognition of Speech, with One and with Two Ears“; E. Colin Cherry; The Journal of the Acoustical Society of America; September 1953. (PDF)

[2] “The Attentive Brain“; Stephen Grossberg; Department of Cognitive and Neural Systemss – Boston University; American Scientist, 1995. (PDF)

[3] “The Cocktail Party Problem : A Primer“; Josh H. McDermott; Current Biology Vol 19. No. 22. (PDF)

Magnetic Resonance Imaging

[4] “Magnetic Resonance ImagingTutorial“; H Panepucci and A Tannus; Technical Report; USP, 1994. (PDF)

[5] “10 Video lessons on MRI by Paul Callaghan” (~ an hour in total). (Videos)

[6] “MRI Tutorial for Neuroscience Boot Camp” Melissa Saenz. (PDF)

Sample ICA Applications Similar to The Cocktail Party Problem

[7] “Independent Component Analysis of Electroencephalographic Data“; Makieng, Bell, Jung, Sejnowski; Advances in Neural Information Processing Systems, 1996. (PDF)

[8] “Application of ICA to MEG noise Reduction“; Masaki Kawakatsu; 4th International Symposium on Independent Component Analysis and Blind Source Separation; 2003. (PDF)

[9] “Independent Component Analysis in Financial Data” from the book Computational Finance; Yasser S. Abu-Mostafa; The MIT Press; 2000. (Book Link)

[10] “ICA of functional MRI data : An overview“; Calhoun, Adali, Hansen, Larsen, Pekar; 4th International Symposium on Independent Component Analysis and Blind Source Separation; 2003. (PDF)

[11] “Independent Component Analysis of fMRI Data – Examining the Assumptions“; McKeown, Sejnowski; Human Brain Mapping; 1998. (PDF)

Independent Component Analysis : Tutorials/Books

[12] “Independent Component Analysis : Algorithms and Applications“; Aapo Hyvärinen, Erkki Oja; Neural Networks; 2000. (PDF)

[13] “Independent Component Analysis“; Aapo Hyvärinen, Juha Karhunen, Erkki Oja; John Wiley Publications; 2001. (Book Link)

[14] ICA Tutorial at videolectures.net by Aapo Hyvärinen. (Videos)

Independent Component Analysis for Magnetic Resonance Imaging

[15] “Application of of Independent Component Analysis to Magnetic Resonance Imaging for enhancing the Contrast of Gray and White Matter“; Nakai, Muraki, Bagarinao, Miki, Takehara, Matsuo, Kato, Sakahara, Isoda; NeuroImage; 2004. (Journal Link)

[16] “Noise in MRI“; Albert Macovski; Magnetic Resonance in Medicine; 1996. (PDF)

[17] “Independent Component Analysis in Magnetic Resonance Image Analysis“;  Ouyang, Chen, Chai, Clayton Chen, Poon, Yang, Lee; EURASIP journal on Advances in Signal Processing; 2008 (Journal Link)

[18] “Band Expansion Based Over-Complete Independent Component Analysis for Multispectral Processing of Magnetic Resonance Images “; Ouyang, Chen, Chai, Clayton Chen, Poon, Yang, Lee; IEEE Transactions on Biomedical Imaging; June 2008. (PDF)

Over-Complete ICA:

[19] “Blind Source Separation of More Sources Than Mixtures Using Over Complete Representations“; Lee, Lewicki, Girolami, Sejnowski; IEEE Signal Processing Letters; 1999. (PDF)

[20] “Learning Overcomplete Representations“; Lewicki, Sejnowski. (PDF)

[21] “A Fast Algorithm for estimating over-complete ICA bases for Image Windows “; Hyvarinen, Cristescu, Oja; International Joint Conference on Neural Networks; 1999. (IEEE Xplore link)

__________

Onionesque Reality Home >>

For the past couple of years, I have had a couple of questions about machine learning research that I have wanted to ask some experts, but never got the chance to do so. I did not even know if my questions made sense at all. I would probably write about them on a blog post soon enough.

It is however ironical that I came to know my questions were valid and well discussed (I never knew what to search for them, I used expressions not used by researchers) only by the death of Ray Solomonoff (he was one researcher who worked on it, and an obituary on him highlighted his work on it, something I missed). Solomonoff was one of the founding fathers of Artificial Intelligence as a field and Machine Learning as a discipline within it. It must be noted that he was one of the few attendees at the 1956 Dartmouth Conference, basically an extended brain storming session that started AI formally as a field. The other attendees were : Marvin Minsky, John McCarthy, Allen Newell, Herbert Simon, Arthur SamuelOliver Selfridge, Claude Shannon, Nathaniel Rochester and Trenchand Moore. His 1950-52 papers on networks are regarded as the first statistical analysis of the same.  Solomonoff was thus a towering figure in AI and Machine Learning.

[Ray Solomonoff  : (25 July 1926- 7 December 2009)]

Solomonoff is widely considered as the father of Machine Learning for circulating the first report on the same in 1956. His particular focus was on the use of probability and its relation to learning. He founded the idea of Algorithmic Probability (ALP)  in a 1960 paper at Caltech, an idea that gives rise to Kolmogorov Complexity as a side product. A. N. Kolmogorov independently discovered similar results and came to know of and acknowledged Solomonoff’s earlier work on Algorithmic Information Theory. His work however was relatively unknown in the west than in the soviet union, which is why Algorithmic Information Theory is mostly referred to as Kolmogorov complexity rather than “Solomonoff Complexity”. Kolmogorov and Solomonoff approached the same framework from different directions. While Kolmogorov was concerned with randomness and Information Theory, Solomonoff was concerned with inductive reasoning. And in doing so he discovered ALP and Kolomogorov Complexity years before anyone did. I would write below on only one aspect of his work that I have studied to some degree in the past year.

Solomonoff With G.J Chaitin, another pioneer in Algorithmic Information Theory

[Image Source]

The Universal Distribution:

His 1956 paper, “An inductive inference machine” was one of the seminal papers that used probability in Machine Learning. He outlined two main problems which he thought (correctly) were linked.

The Problem of Learning in Humans : How do you use all the information that you gather in life in making decisions?

The Problem of Probability : Given you have some data and some a-priori information, how can you make the best possible predictions for the future?

The problem of learning is more general and related to the problem of probability. Solomonoff noted that the Machine learning was simply the process of approximating ideal probabilistic predictions for practical use.

Building on his 1956 paper, he discovered probabilistic languages for induction at a time when it was considered out of fashion. And discovered the Universal Distribution.

All induction problems could be basically reduced to this form : Given a sequence of binary symbols how do you extrapolate it? The answer being that we could assign a probability to a sequence and then use Bayes Theorem to make a prediction on which particular continuation of a string was how likely. That gives rise to an even more difficult question that was the basic question for a lot of Solomonoff’s work and on Algorithmic Probability/Algorithmic Information Theory. This question is : How do you assign probabilities to strings?

Solomonoff approached this problem using the idea of a Universal Turing Machine. Suppose this Turing Machine has three types of tapes, an unidirectional input tape, an unidirectional output tape and a bidirectional working tape. Suppose this machine will take some binary string as input and it may give a binary string as output.

It could do any of the following :

1. Print out a string after a while and then come to a stop.

2. It could print an infinite output string.

3. It could go in an infinite loop for computing the string and not output anything at all (Halting Problem).

For a string x, the ALP would be as follows :

If we feed some bits at random to our Turing Machine, there will always be some probability that the output would start with a string x. This probability is the algorithmic or universal probability of the string x.

The ALP would be given as :

\displaystyle P_M(x) = \sum_{i=0}^{\infty}2^{-\lvert\ S_i(x)\rvert}

Where P_M(x) would be the universal probability of string x with respect to the universal turing machine M. To understand the placement of S_i(x) in the above expression, let’s discuss it a little.

There could be many random strings that after being processed by the Turing Machine give an output string that begins with string x. And S_i(x) is the i^{th} such string. Each such string carries a description of x. And since we want to consider all cases, we take the summation. In the expression above \lvert\ S_i(x)\rvert gives the length of a string and 2^{\lvert\ S_i(x)\rvert} the probability that the random input S_i would output a string starting with x.

This definition of ALP has the following properties that have been stated and proved by Solomonoff in the 60s and the 70s.

1. It assigns higher probabilities to strings with shorter descriptions. This is the reverse of something like Huffman Coding.

2. The value for ALP would be independent of the type of the universal machine used.

3. ALP is incomputible. This is the case because of the halting problem. Infact it is this reason that it has not received much attention. Why get interested in a model that is incomputible? However Solomonoff insisted that approximations to the ALP would be much better than existing systems and that getting the exact ALP is not even needed.

4. P_M(x) is a complete description for x. That means any pattern in the data could be found by using P_M. This means that the universal distribution is the only inductive principle that is complete. And approximations to it would be much desirable.

__________

Solomonoff also worked on Grammar discovery and was very interested in Koza’s Genetic Programming system, which he believed could lead to efficient and much better machine learning methods. He published papers till the ripe old age of 83 and is definitely inspiring for the love of his work.  Paul Vitanyi notes that :

It is unusual to find a productive major scientist that is not regularly employed at all. But from all the elder people (not only scientists) I know, Ray Solomonoff was the happiest, the most inquisitive, and the most satisfied. He continued publishing papers right up to his death at 83.

Solomonoff’s ideas are still not exploited to their full potential and in my opinion would be necessary to explore to build the Machine Learning dream of never-ending learners and incremental + synergistic Machine Learning. I would write about this in a later post pretty soon. It was a life of great distinction and a life well lived. I also wish strength and peace to his wife Grace and his nephew Alex.

The five surviving (in 2006) founders of AI who met in 2006 to commemorate 50 years of the Dartmouth Conference. From left : Trenchand Moore, John McCarthy, Marvin Minsky, Oliver Selfridge and Ray Solomonoff

__________

Refernces and Links:

1. Ray Solomonoff’s publications.

2. Obituary: Ray Solomonoff – The founding father of Algorithmic Information Theory by Paul Vitanyi

3. The Universal Distribution and Machine Learning (PDF).

4. Universal Artificial Intelligence by Marcus Hutter (videolectures.net)

5. Minimum Description Length by Peter Grünwald (videolecures.net)

6. Universal Learning Algorithms and Optimal Search (Neural Information Processing Systems 2002 workshop)

__________

Onionesque Reality Home >>

Well again parts of my notes (modified suitably to be blog posts) for a discussion session!

This post would be the first in a series of four posts. The objective of each post would be as follows:

1. This post would introduce Learning Theory, the bias-variance trade-off and sum up the need of learning theory.

2. This would discuss two simple lemmas : The Union Bound and the Hoeffding inequality and then use them to get to some very deep results in learning theory. It would also introduce and discuss Empirical Risk Minimization.

3. Continuing from the previous discussion this post would derive results on uniform convergence, tie the discussions into a theorem. From this theorem we would have made formal the bias-variance trade-off discussed in the first post.

4. Will talk about VC Dimension and the VC bound.

Basically all the results are derived using two very simple lemmas, hence the name of these posts.

______

Introduction:

Learning theory helps give a researcher applying machine learning algorithms  some rules of the thumb that tell how to best apply the algorithms that he/she has learnt.

Dr Andrew Ng likens knowing machine learning algorithms to a carpenter acquiring a set of tools. However the difference between a good carpenter and not so good one is the skill in using those tools. In choosing which one to use and how. In the same way Learning Theory gives a “machine-learnist” some crude intuitions about how a ML algorithm would work and helps in applying them better.

A lot of people still think of learning theory as a method for getting papers published (I’d like to use that method, I need papers ;-), as it is considered abstruse by many and not of much practical value. A good refutation of this tendency can be seen here on John Langford’s fantastic web-log.

______

As put in a popular tutorial by Olivier Bousquet, the process of inductive learning can be summarized as:

1. Observe a phenomenon.

2. Construct a model of that phenomenon.

3. Make predictions using this model.

Dr Bousquet puts it very tersely that the above process can actually said to be the aim of ALL natural sciences. Machine learning aims to automate the process and learning theory tries to formalize it. I think the above gives a reasonable idea about what learning theory deals with.

Learning theory formalizes terms like generalization, over-fitting and under-fitting. This series of posts (read notes) aims to introduce these terms and then jump to a recap of some important error bounds in learning theory.

______

Training Error, Generalization Error and The Bias-Variance Tradeoff:

For simplicity let’s take something as simple as linear regression. And since I want this piece to be accessible, I assume no knowledge of linear regression either.

Linear Regression essentially models the relationship between one variable X and another variable Y such that the model itself depends linearly on the unknown parameters to be estimated from the data. Let’s have a look at what this means:

Suppose you have a habit of collecting weird datasets and you end up collecting up a dataset that gives the circumference of biceps of many men and the distance a javelin is thrown by each of them. And you want to predict for an unknown individual, given the circumference of his biceps how far can he throw the javelin.

javelin250px-Biceps_887

Ofcourse there would be a number of reasons that would affect the distance a javelin would go, such as skill (which is essentially non-quantitative?), height, the kid of footwear worn, run-up distance, state of health etc. These would be the some of the many features that would affect that end result (distance a javelin is thrown). What I essentially mean is that the circumference of the biceps isn’t a realistic feature to predict how far a javelin can be thrown. But let’s assume that there is only one feature and it can make reasonable predictions. This over-simplification is only made so that the process can be visualized in a graph.

Suppose you collect about 80 such examples (which you call the training examples) and plot your data as such:

untitled

Now the problem given to you is: Given you have the bicep-circumference measurement of an unknown individual, predict how far he can throw the javelin.

How would one do it?

What we would do is to fit in some curve in the above training set (the above plot). And when we have to make a prediction we simply plug in that value in our curve and find the corresponding value for the distance. Something illustrated below.

untitled

The curve can be represented in a number of ways. However, if the curve was to be represented linearly (that’s why it’s called linear regression) it could be written as :

h(x) = \theta_0 + \theta_1 x

Where h(x) is the hypothesis, \theta_0 and  \theta_1 are unknown parameters which are to be learnt from the data and x is the input feature. It is noteworthy that this is like the slope intercept form of the line.

In the above, for simplicity I considered only one feature, there could be many more. In the more general case:

h_\theta(x) = \theta_0 + \theta_1 x_1 + \dotsb + \theta_i x_i \cdots (1)

The \theta’s are called the parameters (to be learnt from the data) that will decide the nature of the curve.

We see that the equation involves features of the training examples (x’s), therefore using this, the task of the learning algorithm will be to decide the most optimum values of \theta_i using the training set. This can be easily done by something like Gradient Descent.

For any new example, we’d have the features x and parameters would already be known by running gradient descent using the training set. We simply have to plug in the value of x in equation (1) to get a prediction.

To sum up : Like I mentioned, we use the training set to fit in a optimal curve and then try to predict unseen inputs by simply plugging in its values to the “equation of the curve”.

Now, it goes without saying that we could fit in a “simple” model to the training set or a more “complex” model. A simple model would be linear say something like:

y=\theta_0 + \theta_1x

and a complex model could be something like this:

y=\theta_0 + \theta_1 x + \dotsb + \theta_5 x^5.

It’s to be noted that in the above the same feature x is used in different ways, the second model uses x to create more features such as x^2, x^3, and so on. Clearly the second representation is more complex than the first as it will exploit more patterns in the data (it has more parameters).

However this increase in complexity can lead to problems, in the same way if the model is too simple it can lead to problems. This is illustrated below:

untitleduntitled3

[Fig 1 (Left) and Fig 2 (Right)]

The figure on the left has a “simple model” fit into the training set. Clearly there are patterns in the data that the model would never take into account, no matter how big the training set goes. Paraphrasing this in more concrete terms, it’s clear that the relationship between x and y is not linear. So if we try to fit in a linear model to it, not matter how much we train it, there would always be some patterns in the data that the model would fail to subsume.

What this means is, what is learnt from the training set will not be generalised well to unknown examples (this is because, it might be that the unknown example comes from that part of the distribution that the model fails to account for and thus the prediction for it would be very inaccurate).

The figure on the right has a “complex” model fit into the same set, clearly the model fits the data very well. But again it is not a good predictor as it does not represent the general nature of the spread of the data but rather takes into account the idiosyncrasies of the same. This model would make very good predictions on the data from the training set itself, but it would not generalize well to unknown examples.

A more appropriate fit would be something like this :

untitled2Now we can move to a definition of the generalization error, The generalization error of a hypothesis is its expected error on examples that are not from the training set. For an example on understanding generalization refer to the part labeled “Van-Gogh Chagall and Pigeons” in this post.

The models shown in figures 1 and 2 have HIGH generalization errors. However each suffer from entirely different problems.

______

Bias : Like already mentioned : In the model shown in fig. 1, no matter how much the model is trained, There would always be some patterns in the data that the model would fail to capture. This is because the model has a high BIAS. Bias of a model is the expected generalization error even if we were to fit in a very large training-set.

Thus the linear model shown in figure 1 suffers from high bias and will underfit the data.

Variance : Apart from bias there is another component that has a bearing on the generalization error. That is the variance of the model fit into the training set.

This is shown in fig. 2. We see that even though that the model fits in very well in the training set, there is the risk that we are fitting patterns that are idiosyncratic to the training examples and may not represent the general pattern between x and y.

Since we might be fitting spurious patters and exaggerating minor fluctuations in the data, such a model would still give a high generalization error and will over-fit the data. In such a case we say that the model has a high variance.

The Trade-off : When deciding on a model to fit onto the training set, there is a trade-off between the bias and the variance. If either is high that would mean the generalizing ability of the model would be low (generalization error would be high). In other words, if the model is too simple i.e if it has too few parameters it would have a high bias and if the model is too complex it would have a high variance. While deciding on a model we have to strike a balance between the two.

A very famous example that illustrates this trade-off goes like this:

Fall Tree

[Suppose there is an exacting biologist who studies and classifies green trees in detail. He would be the example of an over-trained or over-fit model and would declare if he sees a tree with non-green leaves like above that it is not a tree at all]

Cucumber

[An under-trained or under-fit model would be like the above biologist's lazy brother, who on seeing a cucumber which is green declares that it is a tree]

Both of the above have poor generalization. We wish to select a model that has an appropriate trade-off between the two.

______

So why do we need Learning Theory?

Learning theory is an interesting subject in its own right. It, however also hones our intuitions on how to apply learning algorithms properly  giving us a set of rules of the thumb that guide us on how to apply learning algorithms well.

Learning theory can answer quite a few questions :

1. In the previous section there was a small discussion on bias and variance and the trade-off between the two. The discussion sounds logical, however there is no meaning to it unless it is formalized. Learning theory can formalize the bias variance trade-off. This helps as we can then make a choice on choosing the model with just the right bias and variance.

2. Learning Theory leads to model selection methods by which we can choose automatically what model would be appropriate for a certain training set.

3. In Machine Learning, models are fit on the training set. So what we essentially get is the training error. But what we really care about is the generalization ability of the model or the ability to give good predictions on unseen data.

Learning Theory relates the training error on the training set and the generalization error and it would tell us how doing well on the training set might help us get better generalization.

4. Learning Theory actually proves conditions in which the learning algorithms will actually work well. It proves bounds on the worst case performance of models giving us an idea when the algorithm would work properly and when it won’t.

The next post would answer some of the above questions.

______

Onionesque Reality Home >>

I am in the process of winding up taking a basic course on Bio-Informatics, it is offered as an elective subject for final year under-graduate Information Technology students.  I preferred taking this course as a visiting faculty on weekends as managing time in the week is hard (though i did take some classes on weekdays).

______

nbt1205-1499-F1[Gene Clustering : (a) shows clusters (b) uses hierarchical clustering (c) uses k-means (d)  SOM finds clusters which are arranged in grids. Source : Nature Biotechnology 23, 1499 - 1501 (2005) by Patrick D'haeseleer]

Why Bio-Informatics?

The course (out of the ones offered in Fall) I would have preferred taking the most would have been a course on AI. There is no course on Machine Learning or Pattern Recognition at the UG level here, and the course on AI comes closest as it has sufficient weight given to Neural Nets and Bayesian Learning.

The only subject that comes nearest to my choice as AI was not available, was Bio-Informatics as about 60 percent of the syllabus was Machine Learning, Data Mining and Pattern Recognition. And it being a basic course gave me the liberty to take these parts in much more detail as compared to the other parts. And that’s exactly why taking up Bio-Informatics even though it’s not directly my area was not a bad bargain!

______

The Joys of Teaching:

This is the first time that I have formally taken a complete course, I have taken work-shops and given talks quite a few times before. But never taken a complete course.

I have always enjoyed teaching. When I say I enjoy teaching, I don’t necessarily mean something academic. I like discussing ideas in general

If I try to put down why I enjoy teaching, there might be some reasons:

  • There is an obvious inherent joy in teaching that few activities have for me. When i say teaching here, like I said before I don’t just mean to talk about formal teaching, but rather the more general meaning of the term.
  • It’s said that there is no better way to learn than to teach. Actually that was the single largest motivation that prompted me to take that offer.
  • Teaching gives me a high! The time I get to discuss what I like (and teach), I forget things that might be pressing me at other times of the day. I tend to become a space-cadet when into teaching. It’s such a wonderful experience!
  • One more reason that i think i like teaching is this : I have a wide range of reading (or atleast am interested in) and I have noticed that the best way it gets connected and in most unexpected ways is in discussions. You don’t get people who would be interested in involved discussions very often, also being an introvert means the problem is further compounded. Teaching gives me a platform to engage in such discussions. Some of the best ideas that I have got, borrowing from a number of almost unrelated areas is while discussing/teaching. And this course gave me a number of ideas that I would do something about if I get the chance and the resources.
  • Teaching also gives you the limits of your own reading and can inspire you to plug the deficiencies in your knowledge.
  • Other than that, I take teaching or explaining things as a challenge. I enjoy it when I find out that I can explain pion exchanges to friends who have not seen a science book after grade 10. Teaching is a challenge well worth taking for a number of reasons!

From this specific course the most rewarding moment was when a couple of groups approached me after the conclusion of classes to help them a little with their projects. Since their projects are of moderate difficulty and from pattern recognition, I did take that up as a compliment for sure! Though I can not say I can “help” them,  I don’t like using that word, it sounds pretentious, I would definitely like to work with them on their projects and hopefully would learn something new about the area.

______

Course:

I wouldn’t be putting up my notes for the course, but the topics I covered included:

1. Introduction to Bio-Informatics, Historical Overview, Applications, Major Databases, Data Management, Analysis and Molecular Biology.

2. Sequence Visualization, structure visualization, user interface, animation verses simulation, general purpose technologies, statistical concepts, microarrays, imperfect data, quantitative randomness, data analysis, tool selection, statistics of alignment, clustering and classification, regression analysis.

3. Data Mining Methods & Technology overview, infrastructure, pattern recognition & discovery, machine learning methods, text mining & tools, dot matrix analysis, substitution metrics, dynamic programming, word methods, Bayesian methods, multiple sequence alignment, tools for pattern matching.

4. Introduction, working with FASTA, working with BLAST, filtering and capped BLAST, FASTA & BLAST algorithms & comparison.

Like I said earlier, my focus was on dynamic programming, clustering, regression (linear, locally weighted), Logistic regression, support vector machines, Neural Nets, an overview of Bayesian Learning. And then introduced all the other aspects as applications subsequently and covered the necessary theory then!

______

Resources:

All my notes for the course were hand-made and not on \LaTeX, so it would be impossible to put them up now (they were basically made from a number of books and the MIT-OCW).

H0wever I would update this space soon enough linking to all the resources I would recommend.

______

I am looking forward to taking a course on Digital Image Processing and Labs the next semester, which begins December onwards (again as a visiting instructor)! Since Image Processing is closer to the area I am interested in deeply (Applied Machine Learning – Computer Vision), I am already very excited about the possibility!

______

Onionesque Reality Home >>

Sisters and Book

Painting is just another way of keeping a diary.

- Pablo Picasso.

One of the things that I have done right from my puerility, discontinuously sadly, is painting and sketching. One of my oldest hobbies and something that gives me immense peace.

This post finds it way here as :

1. It reminds me that I should spend less time wasting on communities on the internet  when I need to “pass” time as a diversion or something (whatever little I spend anyway) and use it to paint instead whenever the usual workload is a little slack.

2. I come across scores of extremely wonderful things on the internet every week . Why share this then? Oh because it (the painting “Sisters and Book “) moves me in a way that I’d rather not try and describe on a blog or for what reasons it does so. Especially a blog that’s not a personal one and is rather shifting focus towards Machine Learning gradually. :)

_______

2Sisters-and-a-book

[Sisters and Book :  By Iman Maleki]

Another beautiful painting, which almost looks like a photograph to me is this!

2Omens-of-Hafez

[Omens of Hafez - Iman Maleki]

Though I must admit I do not get attracted to realism much, I find Imam Maleki’s work extremely beautiful! Especially the way he paints ladies.

And that’s why this finds its way here. And point noted again to give more time to my beloved hobby.

_______

Quick Links:

1. Iman Maleki’s Home Page

2. Iman Maleki’s Painting Collection

_______

Onionesque Reality Home >>

Okay, a very quick post and so  I’d make it pointed.

1. A lot of work is on in the background and that’s making access to the internet limited, thus limiting writing. A lot of that work should sometime be up on the blog one by one. Though I can’t promise a time frame.

2. I had an occasion to meet a couple of my “blog’s readers” in the last week. And it was an amazing time discussing CV topics (as they happened to read some posts on CV). Will come back to this again.

______

Hate You:

EDIT: I know this saying is too cliched. But I have an extremely silly reason to put it up.

Just came across this Gaping Void cartoon and it’s just too true!

hate you jpeg 400That “something” could actually be anything! Wouldn’t write why I put it. :)

______

Markov Random Fields:

I hate to use my blog for this, but anyhow anybody who has some experience working with Markov random fields and would have the time to discuss some things?

If yes, I would be extremely grateful if you could write me an email at either :

onionesquereality[AT]yahoo[DOT]com

or

shubhendu_trivedi[AT]ieee[DOT]com

______

Coffee:

186-019coffee-posters

I have noticed over the past year that there are a number of regular readers (who have endured some poor and eclectic posts and sparse posting ;-) from Leuven, Belgium.

I should be there for a fortnight pretty soon. Going by my experience in the past week I think it would be an extremely interesting experience to catch up over coffee!

Please send me an email if you’d be interested, we could possibly work out something. Please write to:

onionesquereality[AT]yahoo[DOT]com

or

shubhendu_trivedi[AT]ieee[DOT]com

As a meaningless aside: I always stayed away from Tea and coffee till some years ago. I rediscovered coffee thanks to a very dear friend. So now I can say I am qualified to say “Let’s meet up for coffee”. I wasn’t earlier to be honest.  :)

_____

Onionesque Reality Home >>

I would try to get more systematic about my posts from now on. For every two non-technical posts I would keep two technical posts.

This post would also be the first in a series of posts that in which I intend to write about some Visual Illusions only.

Before getting into subject of this post, it would be helpful to have a quick recap of the background.

_____

The Blind Spot:

Consider a horizontal cross section of the human eye as shown below.

HorzontalSectionOfRightEye

As seen in the above, the innermost membrane is the Retina, and it lines the walls of the posterior portion of the eye. When the eye is focused, light from the focused object is imaged onto the Retina. It thus acts as a screen. Pattern vision is caused by the distribution of discrete light receptors called rods and cones over the retinal surface.

Each eye has about 6-7 million cones, located primarily in the central portion of the Retina and they are highly sensitive to color. Humans can resolve fine details with cones as each cone is connected to its own nerve end. The vision due to cones is called Photopic or bright-light vision.

The number of rods is about 75-150 million andare distributed throughout the retina. The amount of details that can be resolved by rods is lesser as several of them are connected to the same nerve unlike in the cones. Vision due to rods is simply to give an overall picture of the field of view. Objects that seen in bright day light appear as color-less forms in moonlight as only the rods are stimulated. This type of vision is called Scotopic or dim-light vision.

As seen in the figure there is a portion on the retina which has no receptors (rods or cones), thus will not cause any sensation. This is called the blind spot.

Now because of the blind spot a certain field of vision is not perceived. We however do not notice it as the brain fills it with details from the surroundings or using information from the other eye.

The blind spots in both the eyes are arranged symmetrically so that the loss in field of vision in one eye will compensate for the other. This is shown by the figure below.

illustration-blind-spot[Image Source]

If the brain would not fill the lost field of vision with surrounding details and information from the other eye, then the blind spot would appear something like the black dot on the image below.

Blind Spot view

[Image Source]

_____

Now that means, if you close one eye then you can indeed detect the presence of the blind spot as the brain would not have sufficient information about the lost field of vision (though it would be good enough for us to not notice it normally). The presence of the blind spot can be demonstrated by the simple figure below.

Demo of Blind Spot

Click on the above image to enlarge

Now enlarge the above image and close your right eye and focus your left eye on the X only. Don’t try to look at the O on the left. You’d just notice it at the periphery. The object of interest should only be X.

Now move towards the screen, at a certain point you will not see O in the periphery. If you go ahead of this point or behind it you’ll see O again, this specific point (a range actually) where you can not see O indicates the presence of the blind spot.

_____

The Vanishing Head Illusion:

This leads to some interesting illusions, one of the most interesting being the so called vanishing head illusion.

As in the above figure. If the O is replaced by a head, the person would appear headless if the head falls on the blind spot.

Check the video below in full screen for best results.

View in Full Screen

We notice that Richard Wiseman on the left indeed appears headless and that field of view is filled up by the orange background when the blind spot falls.  Then he does something even more interesting. He uses a black bar and moves it up and down in front of his face.  Now instead of seeing the bar as discontinuous, the brain manages to show the bar as a continuous entity!

_____

Onionesque Reality Home >>

A week ago I observed that there was a wonderful new documentary on you-tube, put-up by none other than author and documentary film-maker Christopher J. Sykes. This is what this post is basically about. I’ll digress for a moment and come back to it in a while.

With the exception of the Feynman Lectures in Physics Volume III, Six not so easy pieces (both of which I don’t intend to read in the conceivable future) there is no book with which Feynman was involved (he never wrote himself) that I have not had the opportunity to read. The last that I read was “Don’t You Have Time to Think“, a collection of letters by Feynman.

Don't You Have Time To Think

A number of people including many of Feynman’s friends were surprised to learn that Feynman wrote letters and so many of them. and especially the type of letters he wrote.  Letters give a very different picture of a man than a conventional biography does. And these reveal Feynman to be a genius with a human touch. I have covered points in an earlier post which now seems to me to be overtly enthusiastic. ;-)

Sean Caroll aptly writes that Feynman worship is often overdone, I think he is right. Let me make my own opinion on the matter.

I don’t consider Feynman god or anywhere close to that (but definitely one of my idols and one man I admire greatly), I actually consider him to be very human and some one who was unashamed of admitting to his weaknesses and who had a certain love for life that’s rare. I only am attracted to Feynman for one reason : People like Feynman are a breath of fresh air in the bunch of supercilious pseudo-intellectual snobs that are abound in academia and industry. A breath of fresh air especially for the lesser mortals like me. That’s why I like that man. Why is he so famous? I have tried writing on it before. And I won’t do so anymore.

I’d like to cite two quotes that would give my point of view on the celebrity-fication of scientists, in this case Feynman. Dave Brooks writes in the Telegraph in an article titled “Physicist still leaves some all shook up” February 5, 2003:

Feynman is the person every geek would want to be: very smart, honored  by the establishment even as he won’t play by his rules, admired by people of both sexes, arrogant without being envied and humble without being pitied. In other words, he’s young Elvis, with the Earth  shaking talent transferred from larynx to brain cells and enough sense to have avoided the fat Las Vegas phase. Is such celebrity-fication of scientists good? I think so, even if people do have a tendency to go overboard. Anything that gets us thinking about science is something to be admired, whether it comes in the form of an algorithm or an anecdote.

I remember reading an essay by the legendary Freeman Dyson that said:

Science too needs its share of super heroes to bring in new talent.

These rest my case I suppose.

_____

The only other book of Feynman that I have not read and that I have wanted to read for a LONG time is Tuva or Bust! Richard Feyman’s Last Journey. Unfortunately I have never been able to find it.

Tuva or Bust! Richard Feyman's Last Journey

There was a BBC Horizon documentary on the same. And thankfully Christopher J. Sykes has uploaded that documentary on you-tube.

This is a rare documentary and was the last in which Feynman appeared. It was infact shot just some days before his death. This documents the obsession of Richard Feynman and his friend Ralph Leighton with visiting an obscure place in central Asia called Tannu Tuva. During a discussion on geography and in a teasing mood Feynman was reminded of a long forgotten memory and quipped at Leighton, “Whatever happened to Tannu Tuva”. Leighton thought it was a joke and confidently said that there was no such country at all. After some searching they found out that Tannu Tuva was once a country and now a soviet satellite. It’s capital was “Kyzyl”, the name was so interesting to Feynman that he though he just had to go to this place. The book and the documentary covers Feynman’s and Leighton’s adventure of scheming of getting to go to Tannu Tuva and to get around Soviet bureaucracy. It is an extremely entertaining film to say the least. The end for it is a little sad though. Feynman passed away three days before he got a letter from the Soviets about permission to visit Tannu Tuva and Leighton appears to be on the verge of tears.

The introduction to the documentary reads as:

The story of physicist Richard Feynman’s fascination with the remote Asian country of Tannu Tuva, and his efforts to go there with his great friend and drumming partner Ralph Leighton (co-author of the classic ‘Surely You’re Joking, Mr Feynman’). Feynman was dying of cancer when this was filmed, and died a few weeks after the filming. Originally shown in the BBC TV science series ‘Horizon’ in 1987, and also shown in the USA on PBS ‘Nova’ under the title ‘Last Journey of a Genius’

Find the five parts to the documentary below:

“I’m an explorer okay? I get curious about everything and I want to investigate all kinds of stuff”

Part 1

tatu1Click on the above image to watch

____

Part 2

tatu2Click on the above image to watch

____

Part 3

tatu3-2Click on the above image to watch

____

Part 4

tatu4Click on the above image to watch

____

Part 5

tatu5-2Click on the above image to watch

____

After I got done with the documentary did I realize that the PBS version of the above documentary was available on google video for quite some time.

Find the video here.

_____

Michelle Feynman

Michelle Feynman

As an aside :  though Feynman could not manage to go to Tuva in his lifetime. His daughter Michelle did visit Tuva last month!

_____

One of the things that has me in awe after the documentary over the last week is Tuvan throat singing. It is one of the most remarkable things that I have seen in the past month or two. I am strongly attracted to Tibetan chants too, but these are very different and fascinating. The remarkable thing about them being that the singer can produce two pitches as if being sung by two separate singers. Have a look!

_____

Project Tuva : Character of Physical Law Lectures

On the same day I came across 7 lectures which were given by Feynman at Cornell in 1964 and were put into a book later by the name “The Character of Physical Law”.  These have been made freely available by Microsoft Research. Though some of these lectures have already been on youtube for a while, the ones that were not needless to say were a joy to watch. I had linked to the lectures on Gravitation and Arrow of Time previously.

Project TuvaClick on the above image to be directed to the lectures

I came to know of these lectures on Prof Terence Tao’s page, who I find very inspiring too!

_____

Quick Links:

1. Christopher J. Sykes’ Youtube channel.

2. Tuva or Bust

3. Project Tuva at Microsoft Research

_____

Onionesque Reality Home >>

I picked up these images at Wired two days ago and just could not fit in the time to put them up earlier.

There is a remarkable quote by Einstein -

“Two things are infinite: the universe and human stupidity; and I’m not sure about the the universe.”

It isn’t definite to me if I liked this quotation earlier. But I am NOW wholly convinced that I love it. And this new found unequivocalness for it is due to the following:

As a kid, I used to have a large collection of encyclopedias. I remember reading about the Aral Sea in the picture atlas, and that it mentioned that there was increasing salination of the sea water and that it would disappear in some decades.

Over the years that time, we were fed with doomsday scenarios all the while. Like all the coastal cities would be soon under sea due to rising ocean levels, and that the Himalayas would soon be ice free etc etc. Over a period of time you get fed up with such idle talk and since you don’t see anyone giving convincing answers, you tend to believe that nothing like that is true. Secondly, the eternal optimist that I am, I just probably wished that what that encyclopedia said about the Aral was some “minor” problem.

I last read about the problem many years ago and after that never came across anything on it. And just a couple of days back was shocked by these images. I have only one word for them : Tragic!

The images are from 1973, 1987, 1999, 2006 and 2009. The two recent images were released by the European Space Agency, the earlier ones were taken by the United States Geological Survey.

Aral Sea - 1973

Aral Sea - 1973

Aral Sea - 1987

Aral Sea - 1987

Aral Sea - 1999

Aral Sea - 1999

Aral Sea - 2006

Aral Sea - 2006

Aral Sea - 2009

Aral Sea - 2009

[Image(s) Source : Wired Science]

The South Aral Sea, the remnant of the original lake that you can see to your left on the above image is also expected to vanish by 2020, thankfully the North Aral sea (the part on the right) has been saved due to a world bank funded dam project.

The Aral sea, once the world’s fourth largest lake at roughly around 68,000 sq kms is now just about one-tenth that size. The trouble started when it was decided by the Soviets in 1918 that the two rivers that drained into the Aral – The Amu Darya and Syr Darya would be largely diverted to the deserts to develop them into cotton growing lands. The Soviet plan worked and cotton became one of the most important exports from that area. By the 1960s massive amount of water was being diverted and the sea began to shrink steadily. And how that happened is spoken out loud by the pictures.

The death of the Aral is extremely sad. It’s death has left it’s once thriving fishing industry destroyed, the diverting of the rivers has mostly reduced the two rivers to a shadow of their former selves. The Aral served as a climate moderator in the largely arid lands there, it’s death might herald major environmental catastrophe in the region.

This is a prime example of what human stupidity could lead to and leaves me short of words to describe my anguish at the same.

It has a number of things to say:

Ignoring warnings which have clear proof is just plain stupidity. There is ample proof for example of climate change and its bad impact. For example, I have been visiting the Himalayas once every two years since 1991. And the change there is apparent, as compared to the 80s the glaciers that make up the Ganges have shrunk by several kilometers. I don’t know what the solutions are, nor am I comparing the Aral problem with it. I understand that the Aral was a different kind of a problem. Different because it was known to the Soviets that the lake would dry up from the start. Climate change can not be compared to it as we do not yet fully understand a number of things about it, so how effective the correctives would be is debatable. It would be for our good if that debate is settled soon with good and incisive scientific evidence.

It also is a comment on how totalitarian regimes can be dangerous. In such regimes, since a decision taken can not be opposed, such a decision could either lead to major dividends/progress as it would be implemented very rapidly or major catastrophe as was in the above case.  Soviet officials were aware that the Aral would sooner or later evaporate. In 1964 Aleksandr Asarin noted that :

“It was part of the five-year plans, approved by the council of ministers and the Politburo. Nobody on a lower level would dare to say a word contradicting those plans, even if it was the fate of the Aral Sea.”

Ofcourse he was right, there is rarely any way to convince or reason with or oppose supercilious totalitarian regimes even if their decisions are clearly suicidal. I am tempted to make a political comment on two present day countries (one a totalitarian state and one a liberal democracy) here, but would avoid the temptation.

Anyhow, the images above disturbed me enough to lose sleep.

_____

Onionesque Reality Home >>

http://www.wired.com/wiredscience/2009/07/aralsea/

Here are a number of interesting courses, two of which I am looking at for the past two weeks and that i would hopefully finish by the end of August-September.

Introduction to Neural Networks (MIT):

These days, amongst the other things that I have at hand including a project on content based image retrieval. I have been making it a point to look at a MIT course on Neural Networks. And needless to say, I am getting to learn loads.

neurons1

I would like to emphasize that though I have implemented a signature verification system using Neural Nets, I am by no means good with them. I can be classified a beginner. The tool that I am more comfortable with are Support Vector Machines.

I have been wanting to know more about them for some years now, but I never really got the time or you can say the opportunity. Now that I can invest some time, I am glad I came across this course. So far I have been able to look at 7 lectures and I should say that I am MORE than very happy with the course. I think it is very detailed and extremely well suited for the beginner as well as the expert.

The instructor is H. Sebastian Seung who is the professor of computational neuroscience at the MIT.

The course has 25 lectures each one packed with a great amount of information. Meaning, the lectures might work slow for those who are not very familiar with this stuff.

The video lectures can be accessed over here. I must admit that i am a little disappointed that these lectures are not available on you-tube. That’s because the downloads are rather large in size. But I found them worth it any way.

The lectures cover the following:

Lecture 1: Classical neurodynamics
Lecture 2: Linear threshold neuron
Lecture 3: Multilayer perceptrons
Lecture 4: Convolutional networks and vision
Lecture 5: Amplification and attenuation
Lecture 6: Lateral inhibition in the retina
Lecture 7: Linear recurrent networks
Lecture 8: Nonlinear global inhibition
Lecture 9: Permitted and forbidden sets
Lecture 10: Lateral excitation and inhibition
Lecture 11: Objectives and optimization
Lecture 12: Excitatory-inhibitory networks
Lecture 13: Associative memory I
Lecture 14: Associative memory II
Lecture 15: Vector quantization and competitive learning
Lecture 16: Principal component analysis
Lecture 17: Models of neural development
Lecture 18: Independent component analysis
Lecture 19: Nonnegative matrix factorization. Delta rule.
Lecture 20: Backpropagation I
Lecture 21: Backpropagation II
Lecture 22: Contrastive Hebbian learning
Lecture 23: Reinforcement Learning I
Lecture 24: Reinforcement Learning II
Lecture 25: Review session

The good thing is that I have formally studied most of the stuff after lecture 13 , but going by the quality of lectures so far (first 7), I would not mind seeing them again.

Quick Links:

Course Home Page.

Course Video Lectures.

Prof H. Sebastian Seung’s Homepage.

_____

Visualization:

This is a Harvard course. I don’t know when I’ll get the time to have a look at this course, but it sure looks extremely interesting. And I am sure a number of people would be interested in having a look at it. It looks like a course that be covered up pretty quickly actually.tornado

[Image Source]

The course description says the following:

The amount and complexity of information produced in science, engineering, business, and everyday human activity is increasing at staggering rates. The goal of this course is to expose you to visual representation methods and techniques that increase the understanding of complex data. Good visualizations not only present a visual interpretation of data, but do so by improving comprehension, communication, and decision making.

In this course you will learn how the human visual system processes and perceives images, good design practices for visualization, tools for visualization of data from a variety of fields, collecting data from web sites with Python, and programming of interactive visualization applications using Processing.

The topics covered are:

  • Data and Image Models
  • Visual Perception & Cognitive Principles
  • Color Encoding
  • Design Principles of Effective Visualizations
  • Interaction
  • Graphs & Charts
  • Trees and Networks
  • Maps & Google Earth
  • Higher-dimensional Data
  • Unstructured Text and Document Collections
  • Images and Video
  • Scientific Visualization
  • Medical Visualization
  • Social Visualization
  • Visualization & The Arts

Quick Links:

Course Home Page.

Course Syllabus.

Lectures, Slides and other materials.

Video Lectures

_____

Advanced AI Techniques:

This is one course that I would  be looking at some parts of after I have covered the course on Neural Nets.  I am yet to glance at the first lecture or the materials, so i can not say how they would be like. But I sure am expecting a lot from them going by the topics they are covering.

The topics covered in a broad sense are:

  • Bayesian Networks
  • Statistical NLP
  • Reinforcement Learning
  • Bayes Filtering
  • Distributed AI and Multi-Agent systems
  • An Introduction to Game Theory

Quick Link:

Course Home.

_____

Astrophysical Chemistry:

I don’t know if I would be able to squeeze in time for these. But because of my amateurish interest in chemistry (If I were not an electrical engineer, I would have been into Chemistry), and because I have very high regard for Dr Harry Kroto (who is delivering them) I would try and make it a point to have a look at them. I think I’ll skip gym for some days to have a look at them. ;-)

kroto2006

[Nobel Laureate Harry Kroto with a Bucky-Ball model - Image Source : richarddawkins.net]

Quick Links:

Dr Harold Kroto’s Homepage.

Astrophysical Chemistry Lectures

_____

Onionesque Reality Home >>

Older Posts »