Feeds:
Posts
Comments

Archive for the ‘Artificial Intelligence’ Category

On June 23 last year ACM organized a special event to celebrate the birth centenary of Alan Turing with 33 Turing award winners in attendance. Needless to say most of the presentations, lectures and discussions were quite fantastic. They had been placed as webcasts on this website, however rather unfortunately, they were not individually linkable and it was difficult to share them. I noticed day before yesterday that they were finally available on youtube!

One particular video that I had liked a lot was “An Algorithmic View of the Universe”, featuring some great discussions between a panel comprising of Robert Tarjan, Richard Karp, Don Knuth, Leslie Valiant and Leonard Adleman, with the panel chaired by Christos Papadimitriou. You might want to consider watching it if you have about 80 minutes to spare and provided you haven’t watched it earlier!

Advertisements

Read Full Post »

For the past couple of years, I have had a couple of questions about machine learning research that I have wanted to ask some experts, but never got the chance to do so. I did not even know if my questions made sense at all. I would probably write about them on a blog post soon enough.

It is however ironical that I came to know my questions were valid and well discussed (I never knew what to search for them, I used expressions not used by researchers) only by the death of Ray Solomonoff (he was one researcher who worked on it, and an obituary on him highlighted his work on it, something I missed). Solomonoff was one of the founding fathers of Artificial Intelligence as a field and Machine Learning as a discipline within it. It must be noted that he was one of the few attendees at the 1956 Dartmouth Conference, basically an extended brain storming session that started AI formally as a field. The other attendees were : Marvin Minsky, John McCarthy, Allen Newell, Herbert Simon, Arthur SamuelOliver Selfridge, Claude Shannon, Nathaniel Rochester and Trenchand Moore. His 1950-52 papers on networks are regarded as the first statistical analysis of the same.  Solomonoff was thus a towering figure in AI and Machine Learning.

[Ray Solomonoff  : (25 July 1926- 7 December 2009)]

Solomonoff is widely considered as the father of Machine Learning for circulating the first report on the same in 1956. His particular focus was on the use of probability and its relation to learning. He founded the idea of Algorithmic Probability (ALP)  in a 1960 paper at Caltech, an idea that gives rise to Kolmogorov Complexity as a side product. A. N. Kolmogorov independently discovered similar results and came to know of and acknowledged Solomonoff’s earlier work on Algorithmic Information Theory. His work however was relatively unknown in the west than in the soviet union, which is why Algorithmic Information Theory is mostly referred to as Kolmogorov complexity rather than “Solomonoff Complexity”. Kolmogorov and Solomonoff approached the same framework from different directions. While Kolmogorov was concerned with randomness and Information Theory, Solomonoff was concerned with inductive reasoning. And in doing so he discovered ALP and Kolomogorov Complexity years before anyone did. I would write below on only one aspect of his work that I have studied to some degree in the past year.

Solomonoff With G.J Chaitin, another pioneer in Algorithmic Information Theory

[Image Source]

The Universal Distribution:

His 1956 paper, “An inductive inference machine” was one of the seminal papers that used probability in Machine Learning. He outlined two main problems which he thought (correctly) were linked.

The Problem of Learning in Humans : How do you use all the information that you gather in life in making decisions?

The Problem of Probability : Given you have some data and some a-priori information, how can you make the best possible predictions for the future?

The problem of learning is more general and related to the problem of probability. Solomonoff noted that the Machine learning was simply the process of approximating ideal probabilistic predictions for practical use.

Building on his 1956 paper, he discovered probabilistic languages for induction at a time when it was considered out of fashion. And discovered the Universal Distribution.

All induction problems could be basically reduced to this form : Given a sequence of binary symbols how do you extrapolate it? The answer being that we could assign a probability to a sequence and then use Bayes Theorem to make a prediction on which particular continuation of a string was how likely. That gives rise to an even more difficult question that was the basic question for a lot of Solomonoff’s work and on Algorithmic Probability/Algorithmic Information Theory. This question is : How do you assign probabilities to strings?

Solomonoff approached this problem using the idea of a Universal Turing Machine. Suppose this Turing Machine has three types of tapes, an unidirectional input tape, an unidirectional output tape and a bidirectional working tape. Suppose this machine will take some binary string as input and it may give a binary string as output.

It could do any of the following :

1. Print out a string after a while and then come to a stop.

2. It could print an infinite output string.

3. It could go in an infinite loop for computing the string and not output anything at all (Halting Problem).

For a string x, the ALP would be as follows :

If we feed some bits at random to our Turing Machine, there will always be some probability that the output would start with a string x. This probability is the algorithmic or universal probability of the string x.

The ALP would be given as :

\displaystyle P_M(x) = \sum_{i=0}^{\infty}2^{-\lvert\ S_i(x)\rvert}

Where P_M(x) would be the universal probability of string x with respect to the universal turing machine M. To understand the placement of S_i(x) in the above expression, let’s discuss it a little.

There could be many random strings that after being processed by the Turing Machine give an output string that begins with string x. And S_i(x) is the i^{th} such string. Each such string carries a description of x. And since we want to consider all cases, we take the summation. In the expression above \lvert\ S_i(x)\rvert gives the length of a string and 2^{\lvert\ S_i(x)\rvert} the probability that the random input S_i would output a string starting with x.

This definition of ALP has the following properties that have been stated and proved by Solomonoff in the 60s and the 70s.

1. It assigns higher probabilities to strings with shorter descriptions. This is the reverse of something like Huffman Coding.

2. The value for ALP would be independent of the type of the universal machine used.

3. ALP is incomputible. This is the case because of the halting problem. Infact it is this reason that it has not received much attention. Why get interested in a model that is incomputible? However Solomonoff insisted that approximations to the ALP would be much better than existing systems and that getting the exact ALP is not even needed.

4. P_M(x) is a complete description for x. That means any pattern in the data could be found by using P_M. This means that the universal distribution is the only inductive principle that is complete. And approximations to it would be much desirable.

__________

Solomonoff also worked on Grammar discovery and was very interested in Koza’s Genetic Programming system, which he believed could lead to efficient and much better machine learning methods. He published papers till the ripe old age of 83 and is definitely inspiring for the love of his work.  Paul Vitanyi notes that :

It is unusual to find a productive major scientist that is not regularly employed at all. But from all the elder people (not only scientists) I know, Ray Solomonoff was the happiest, the most inquisitive, and the most satisfied. He continued publishing papers right up to his death at 83.

Solomonoff’s ideas are still not exploited to their full potential and in my opinion would be necessary to explore to build the Machine Learning dream of never-ending learners and incremental + synergistic Machine Learning. I would write about this in a later post pretty soon. It was a life of great distinction and a life well lived. I also wish strength and peace to his wife Grace and his nephew Alex.

The five surviving (in 2006) founders of AI who met in 2006 to commemorate 50 years of the Dartmouth Conference. From left : Trenchand Moore, John McCarthy, Marvin Minsky, Oliver Selfridge and Ray Solomonoff

__________

Refernces and Links:

1. Ray Solomonoff’s publications.

2. Obituary: Ray Solomonoff – The founding father of Algorithmic Information Theory by Paul Vitanyi

3. The Universal Distribution and Machine Learning (PDF).

4. Universal Artificial Intelligence by Marcus Hutter (videolectures.net)

5. Minimum Description Length by Peter Grünwald (videolecures.net)

6. Universal Learning Algorithms and Optimal Search (Neural Information Processing Systems 2002 workshop)

__________

Onionesque Reality Home >>

Read Full Post »

Here are a number of interesting courses, two of which I am looking at for the past two weeks and that i would hopefully finish by the end of August-September.

Introduction to Neural Networks (MIT):

These days, amongst the other things that I have at hand including a project on content based image retrieval. I have been making it a point to look at a MIT course on Neural Networks. And needless to say, I am getting to learn loads.

neurons1

I would like to emphasize that though I have implemented a signature verification system using Neural Nets, I am by no means good with them. I can be classified a beginner. The tool that I am more comfortable with are Support Vector Machines.

I have been wanting to know more about them for some years now, but I never really got the time or you can say the opportunity. Now that I can invest some time, I am glad I came across this course. So far I have been able to look at 7 lectures and I should say that I am MORE than very happy with the course. I think it is very detailed and extremely well suited for the beginner as well as the expert.

The instructor is H. Sebastian Seung who is the professor of computational neuroscience at the MIT.

The course has 25 lectures each one packed with a great amount of information. Meaning, the lectures might work slow for those who are not very familiar with this stuff.

The video lectures can be accessed over here. I must admit that i am a little disappointed that these lectures are not available on you-tube. That’s because the downloads are rather large in size. But I found them worth it any way.

The lectures cover the following:

Lecture 1: Classical neurodynamics
Lecture 2: Linear threshold neuron
Lecture 3: Multilayer perceptrons
Lecture 4: Convolutional networks and vision
Lecture 5: Amplification and attenuation
Lecture 6: Lateral inhibition in the retina
Lecture 7: Linear recurrent networks
Lecture 8: Nonlinear global inhibition
Lecture 9: Permitted and forbidden sets
Lecture 10: Lateral excitation and inhibition
Lecture 11: Objectives and optimization
Lecture 12: Excitatory-inhibitory networks
Lecture 13: Associative memory I
Lecture 14: Associative memory II
Lecture 15: Vector quantization and competitive learning
Lecture 16: Principal component analysis
Lecture 17: Models of neural development
Lecture 18: Independent component analysis
Lecture 19: Nonnegative matrix factorization. Delta rule.
Lecture 20: Backpropagation I
Lecture 21: Backpropagation II
Lecture 22: Contrastive Hebbian learning
Lecture 23: Reinforcement Learning I
Lecture 24: Reinforcement Learning II
Lecture 25: Review session

The good thing is that I have formally studied most of the stuff after lecture 13 , but going by the quality of lectures so far (first 7), I would not mind seeing them again.

Quick Links:

Course Home Page.

Course Video Lectures.

Prof H. Sebastian Seung’s Homepage.

_____

Visualization:

This is a Harvard course. I don’t know when I’ll get the time to have a look at this course, but it sure looks extremely interesting. And I am sure a number of people would be interested in having a look at it. It looks like a course that be covered up pretty quickly actually.tornado

[Image Source]

The course description says the following:

The amount and complexity of information produced in science, engineering, business, and everyday human activity is increasing at staggering rates. The goal of this course is to expose you to visual representation methods and techniques that increase the understanding of complex data. Good visualizations not only present a visual interpretation of data, but do so by improving comprehension, communication, and decision making.

In this course you will learn how the human visual system processes and perceives images, good design practices for visualization, tools for visualization of data from a variety of fields, collecting data from web sites with Python, and programming of interactive visualization applications using Processing.

The topics covered are:

  • Data and Image Models
  • Visual Perception & Cognitive Principles
  • Color Encoding
  • Design Principles of Effective Visualizations
  • Interaction
  • Graphs & Charts
  • Trees and Networks
  • Maps & Google Earth
  • Higher-dimensional Data
  • Unstructured Text and Document Collections
  • Images and Video
  • Scientific Visualization
  • Medical Visualization
  • Social Visualization
  • Visualization & The Arts

Quick Links:

Course Home Page.

Course Syllabus.

Lectures, Slides and other materials.

Video Lectures

_____

Advanced AI Techniques:

This is one course that I would  be looking at some parts of after I have covered the course on Neural Nets.  I am yet to glance at the first lecture or the materials, so i can not say how they would be like. But I sure am expecting a lot from them going by the topics they are covering.

The topics covered in a broad sense are:

  • Bayesian Networks
  • Statistical NLP
  • Reinforcement Learning
  • Bayes Filtering
  • Distributed AI and Multi-Agent systems
  • An Introduction to Game Theory

Quick Link:

Course Home.

_____

Astrophysical Chemistry:

I don’t know if I would be able to squeeze in time for these. But because of my amateurish interest in chemistry (If I were not an electrical engineer, I would have been into Chemistry), and because I have very high regard for Dr Harry Kroto (who is delivering them) I would try and make it a point to have a look at them. I think I’ll skip gym for some days to have a look at them. ;-)

kroto2006

[Nobel Laureate Harry Kroto with a Bucky-Ball model – Image Source : richarddawkins.net]

Quick Links:

Dr Harold Kroto’s Homepage.

Astrophysical Chemistry Lectures

_____

Onionesque Reality Home >>

Read Full Post »

I have been following Dr Adrian Dyer’s work for about a year. I discovered him and his work while searching on the Monash University website for research on vision, ML and related fields when a friend of mine was thinking of applying for a PhD there. Dr Dyer is a vision scientist and a top expert on Bees. His present research is based on investigating aspects of insect vision and  how insects make decisions in complex colored natural environments.

His current findings (published just about a  month or so back) could have important cues on how to make improvements in computer based facial recognition systems.

Thermogram of bumblebee on flower

[Thermographic Image of a Bumblebee on a flower showing temperature difference : Image Source ]

_____

When i read about Dr Dyer’s most recent work I was instantly reminded of a rather similar experiment on pigeons done almost a decade and a half back by Watanabe et al [1] .

Digression: Van Gogh, Chagall and Pigeons

Allow me to take a brief digression and have a look at that classic experiment before returning to Bees. The experiment by Watanabe et al is my favorite when i have to initiate a general talk on WHAT is pattern recognition and generalization etc.

In the experiment, pigeons were placed in a skinner box and were trained to recognize paintings by two different artists (eg Van Gogh and Chagall). The training to recognize paintings by a particular artist was done by giving a reward on pecking when presented paintings by that artist.

pigeon_skinner_box

picture1

picture2

_____

The pigeons were trained on a marked set of paintings by Van Gogh and Chagall. When tested on the SAME set they gave a discriminative ability of 95 %.  Then they were presented with paintings by the same artists that they had not previously seen. The ability of the pigeons to discriminate was still about 85%, which is a pretty high recognition rate. Interestingly this performance wasn’t much off human performance.

What this means is that pigeons do not memorize pictures but they can extract patterns (the “style” of painting etc) and can generalize from already seen data to make predictions on data never seen before. Isn’t this really what Artificial Neural Networks and Support Vector Machines do? And there is a lot of research going on to work to improve the generalization of these systems.

_____

Coming Back to Bees:

Dr Adrian Dyer’s group has been working on the visual information processing in miniature brains, brains that are much smaller than a primate brain. And honeybees provide a good model for working on such brains. One reason being that a lot about their behavior is known.

Their work has mostly been on color information processing in bees and how it could lend cues  for improvement in vision in robotic systems. Investigations have shown that the very small Bee brain can be taught to recognize extremely difficult/ complex objects.

_____

Color Vision in Bees [2] :

Research by Dr Dyer’s group on color vision in bees has provided key insights on the relationship between the perceptive abilities of bees and the flowers that they frequent. As an example it was known that flowers had evolved particular colors to suit the visual system of bees, but it was not known how it was that certain flower colors were extremely rare in nature. The color discrimination abilities of the Bees were tested by making a virtual meadow. There was high variability in the strategies used by bees to solve problems, however the discriminative abilities of the Bees were somewhat comparable to that of the Human sensory abilities. This research has shown not only how plants evolve flower colors to attract pollinators but also on what is the nature of color vision systems in nature and what are their limitations.

_____

Recognition of Rotated Objects (Faces) [3] :

The ability to recognize 3-Dimensional objects in natural environments is a challenging problem for visual systems especially in condition of rotation of object, variation in lighting conditions, occlusion etc. The effect of rotation on an object sometimes may cause it to look more dissimilar when compared to its rotated version than a non-rotated different object. Human and primate brains are known to recognize novel orientations of rotated objects by a method of interpolating between stored views. Primate brains perform better when presented with novel views that can be interpolated to from stored views than those novel views that required extrapolation beyond the stored views. Some animals, for example pigeons perform equally well on interpolated and extrapolated views.

To test how miniature brains, such as in Bees would deal with the problem posed by rotation of objects, Dr Dyer’s group presented Bees a face recognition task.  The bees were trained for different views (0, 30 and 60 degrees) of two similar faces S1 and S2, these are enough for getting an insight on how the brain solves the problem of a view with which it has no prior experience. The Bees were trained with face images from a standard test for studying vision for human infants.

training-images1

[Image Source – 3]

Group 1 of bees was trained for 0 degree orientation and then was given a non-reward test on the same view and that of a novel 30 degree view.

Group 2 was trained for 60 degree orientation and then was tested on the same view (non-reward) and a novel 30 degree view.

Group 3 was trained for both the 0 degree and 60 degree orientations and then was tested on the same and novel 30 degree interpolation visual stimuli.

Group 4 was trained with both 0 degree and 30 degree orientations and then was tested on the same and also the novel 60 degree extrapolation view.

Bees in all of the four groups were able to recognize the trained target stimuli well above the chance values. But when presented with novel views Group 1 and Group 2 could not recognize novel views, while Group 3 could. They could do so by interpolating between the 0 degree and the 60 degree views. Group 4 too could not recognize novel views presented to it, indicating that Bees could not extrapolate from the 0 and 30 degree training to recognize faces oriented at 60 degrees.

The experiment puts forth that a miniature brain can learn to recognize complex visual stimuli by experience by a method of image interpolation and averaging. However the performance is bad when the stimuli required extrapolation.

_____

Bees performed poorly on images that were inverted. But in this regard they are in good company,  even humans do not perform well when images are inverted. As an illustration consider these two images:

inverted1-2

How similar do these two inverted images look to be at first glance? Very similar? Let’s now have a look at the upright images. The one on the left below corresponds to the image on the left above.

upright1-2

[Image Source]

The upright images look very different, even though the inverted images looked very similar.

I would recommend an exercise on rotating the image on the right to its upright position and back, please follow this link for doing so. This was a good illustration of the fact that the human visual system does not perform AS well with inverted images.

_____

The findings show that despite the constrained neural resources of the bee brain (about 0.01 percent as compared to humans) they can perform remarkably well at complex visual recognition tasks. Dr Dyer says this could lead to improved models in artificial systems as this test is evidence that face recognition does not require an extremely advanced nervous system.

The conclusion is that the Bee brain, which is small (about 850,000 neurons) can be simulated with relative ease (as compared to what was previously thought for complex pattern analysis and recognition) for performing rather complex pattern recognition tasks.

On a lighter note, Dr Dyer also says that his findings don’t justify the adage that you should not kill a bee otherwise its nest mates will remember you to take revenge. ;-)

_____

References and Recommended Papers:

[1] Van Gogh, Chagall and Pigeons: Picture Discrimination in Pigeons and Humans in “Animal Cognition”, Springer Varlag – Shigeru Watanabe.

[2] The Evolution of Flower Signals to Attract Pollinators – Adrian Dyer.

[3] Insect Brains use Image Interpolation Mechanisms to Recognise Rotated Objects in “PLos One” – Adrian Dyer, Quoc Vuong.

[4] Honeybees can recognize complex images of Natural scenes for use as Potential Landmarks – Dyer, Rosa, Reser.

_____

Onionesque Reality Home >>

Read Full Post »

Why?

Two Reasons:

1. Eigenfaces is probably one of the simplest face recognition methods and also rather old, then why worry about it at all? Because, while it is simple it works quite well. And it’s simplicity also makes it a good way to understand how face recognition/dimensionality reduction etc works.

2. I was thinking of writing a post based on face recognition in Bees next, so this should serve as a basis for the next post too. The idea of this post is to give a simple introduction to the topic with an emphasis on building intuition. For more rigorous treatments, look at the references.

_____

Introduction

Like almost everything associated with the Human body –  The Brain, perceptive abilities, cognition and consciousness, face recognition in humans is a wonder. We are not yet even close to an understanding of how we manage to do it. What is known is that it is that the Temporal Lobe in the brain is partly responsible for this  ability. Damage to the temporal lobe can result in the condition in which the concerned person can lose the ability to recognize faces. This specific condition where  an otherwise normal person who suffered some damage to a specific region in the temporal lobe loses the ability to recognize faces is called prosopagnosia. It is a very interesting condition  as  the perception of faces remains normal (vision pathways and perception is fine) and the person can recognize people by their voice but not by faces. In one of my previous posts, which had links to a series of lectures by Dr Vilayanur Ramachandran, I did link to one lecture by him in which he talks in detail about this condition. All this aside, not much is known how the perceptual information for a face is coded in the brain too.

brain_lobes

_____

A Motivating Parallel

Eigenfaces has a parallel to one of the most fundamental ideas in mathematics and signal processing – The Fourier Series. This parallel is also very helpful to build an intuition to what Eigenfaces (or PCA) sort of does and hence must be exploited. Hence we review the Fourier Series in a few sentences.

Fourier series are named so in the honor of Jean Baptiste Joseph Fourier (Generally Fourier is pronounced as “fore-yay”, however the correct French pronunciation is “foor-yay”) who made important contributions to their development. Representation of a signal in the form of a linear combination of complex sinusoids is called the Fourier Series. What this means is that you can’t just split a periodic signal into simple sines and cosines, but you can also approximately reconstruct that signal given you have information how the sines and cosines that make it up are stacked.

More Formally: Put in more formal terms, suppose f(x) is a periodic function with period 2\pi defined in the interval c\leq x \leq c+2\pi and satisfies a set of conditions called the Dirichlet’s conditions:

1. f(x) is finite, single valued and its integral exists in the interval.

2. f(x) has a finite number of discontinuities in the interval.

3. f(x) has a finite number of extrema in that interval.

then f(x) can be represented by the trigonometric series

f(x) = \displaystyle\frac{a_0}{2} + \displaystyle \sum_{n=1}^\infty [a_n cos(nx) + b_n sin(nx) ]\qquad(1)

The above representation of f(x) is called the Fourier series and the coefficients a_0, a_n and b_n are called the fourier coefficients and are determined from f(x) by Euler’s formulae. The coefficients are given as :

a_0 = \displaystyle \frac{1}{\pi} \int_{c}^{c+2\pi} f(x)dx

a_n = \displaystyle\frac{1}{\pi}\int_{c}^{c+2\pi} f(x)cos(nx)dx

\displaystyle b_n = \frac{1}{\pi}\int_{c}^{c+2\pi} f(x)sin(nx)dx

Note: It is common to define the above using c = -\pi

An example that illustrates \qquad (1) or the Fourier series is:

fourier4[Image Source]

A square wave (given in black) can be approximated to by using a series of sines and cosines (result of this summation shown in blue). Clearly in the limiting case, we could reconstruct the square wave exactly with simply sines and cosines.

_____

Though not exactly the same, the idea behind Eigenfaces is similar. The aim is to represent a face as a linear combination of a set of basis images (in the Fourier Series the bases were simply sines and cosines). That is :

\displaystyle\Phi_i = \displaystyle\sum_{j=1}^{K}w_j u_j

Where \displaystyle \Phi_i represents the i^{th} face with the mean subtracted from it, w_j represent weights and u_j the eigenvectors. If this makes somewhat sketchy sense then don’t worry. This was just like mentioning at the start what we have to do.

The big idea is that you want to find a set of images (called Eigenfaces, which are nothing but Eigenvectors of the training data) that if you weigh and add together should give you back a image that you are interested in (adding images together should give you back an image, Right?). The way you weight these basis images (i.e the weight vector) could be used as a sort of a code for that image-of-interest and could be used as features for recognition.

This can be represented aptly in a figure as:

eigenfaces-reconstruction

Click to Enlarge

In the above figure, a face that was in the training database was reconstructed by taking a weighted summation of all the basis faces and then adding to them the mean face. Please note that in the figure the ghost like basis images (also called as Eigenfaces, we’ll see why they are called so) are not in order of their importance. They have just been picked randomly from a pool of 70 by me. These Eigenfaces were prepared using images  from the MIT-CBCL database (also I have adjusted the brightness of the Eigenfaces to make them clearer after obtaining them, therefore the brightness of the reconstructed image looks different than those of the basis images).

_____

An Information Theory Approach:

First of all the idea of Eigenfaces considers face recognition as a 2-D recognition problem, this is based on the assumption that at the time of recognition, faces will be mostly upright and frontal. Because of this, detailed 3-D information about the face is not needed. This reduces complexity by a significant bit.

Before the method for face recognition using Eigenfaces was introduced, most of the face recognition literature dealt with local and intuitive features, such as distance between eyes, ears and similar other features. This wasn’t very effective. Eigenfaces inspired by a method used in an earlier paper was a significant departure from the idea of using only intuitive features. It uses an Information Theory appraoch wherein the most relevant face information is encoded in a group of faces that will best distinguish the faces. It transforms the face images in to a set of basis faces, which essentially are the principal components of the face images.

The Principal Components (or Eigenvectors) basically seek directions in which it is more efficient to  represent the data. This is particularly useful for reducing the computational effort. To understand this,  suppose we get 60 such directions, out of these about 40 might be insignificant and only 20 might represent the variation in data significantly, so for calculations it would work quite well to only use the 20 and leave out the rest.  This is illustrated by this figure:

principal-components1

Click to Enlarge

Such an information theory approach will encode not only the local features but also the global features. Such features may or may not be intuitively understandable. When we find the principal components or the Eigenvectors of the image set, each Eigenvector has some contribution from EACH face used in the training set. So the Eigenvectors also have a face like appearance. These look ghost like and are ghost images or Eigenfaces. Every image in the training set can be represented as a weighted linear combination of these basis faces.

The number of Eigenfaces that we would obtain therefore would be equal to the number of images in the training set. Let us take this number to be M. Like I mentioned one paragraph before, some of these Eigenfaces are more important in encoding the variation in face images, thus we could also approximate faces using only the K most significant Eigenfaces.

_____

Assumptions:

1. There are M images in the training set.

2. There are K most significant Eigenfaces using which we can satisfactorily approximate a face. Needless to say K < M.

3. All images are N \times N matrices, which can be represented as N^2 \times 1 dimensional vectors. The same logic would apply to images that are not of equal length and breadths. To take an example: An image of size 112 x 112 can be represented as a vector of dimension 12544 or simply as a point in a 12544 dimensional space.

_____

Algorithm for Finding Eigenfaces:

1. Obtain M training images I_1, I_2I_M, it is very important that the images are centered.

training-images

2. Represent each image I_i as a vector \Gamma_i  as discussed above.

I_i = \begin{bmatrix} a_{11} & a_{12} &\ldots & a_{1N} \ a_{21} & a_{22} & \ldots & a_{2N} \ \vdots & \vdots & \ddots & \vdots \ a_{N1} & a_{N2} & \ldots & a_{NN}\end{bmatrix}_{N\times N} \xrightarrow{\rm concatenation} \begin{bmatrix}\ a_{11} \ \vdots \ a_{1N} \ \vdots\ a_{2N} \ \vdots \ a_{NN} \end{bmatrix}_{N^2\times 1} = \Gamma_i

Note: Due to a recent WordPress \LaTeX bug, there is some trouble with constructing matrices with multiple columns. To avoid confusion and to maintain continuity, for the time being I am posting an image for the above formula that’s showing an error message. Same goes for some formulae below in the post.

untitled3. Find the average face vector \Psi.

\Psi = \displaystyle\frac{1}{M}\sum_{i=1}^M\Gamma_i

4. Subtract the mean face from each face vector \Gamma_i to get a set of vectors \Phi_i. The purpose of subtracting the mean image from each image vector is to be left with only the distinguishing features from each face and “removing” in a way information that is common.

\Phi_i = \Gamma_i - \Psi

5. Find the Covariance matrix C:

C = AA^T, where A=[\Phi_1, \Phi_2 \ldots \Phi_M]

Note that the Covariance matrix has simply been made by putting one modified image vector obtained in  one column each.

Also note that C is a N^2 \times N^2 matrix and A is a N^2\times M matrix.

6. We now need to calculate the Eigenvectors u_i of C, However note that C is a N^2 \times N^2 matrix and it would return N^2 Eigenvectors each being N^2 dimensional. For an image this number is HUGE.  The computations required would easily make your system run out of memory. How do we get around this problem?

7. Instead of the Matrix AA^T consider the matrix A^TA. Remember A is a N^2\times M matrix, thus A^TA is a M\times M matrix. If we find the Eigenvectors of this matrix, it would return M Eigenvectors, each of Dimension M \times 1, let’s call these Eigenvectors v_i.

Now from some properties of matrices, it follows that: u_i = Av_i. We have found out v_i earlier. This implies that using v_i we can calculate the M largest Eigenvectors of AA^T. Remember that M\ll N^2 as M is simply the number of training images.

8. Find the best M Eigenvectors of C=AA^T by using the relation discussed above. That is: u_i = Av_i. Also keep in mind that \begin{Vmatrix}u_i\end{Vmatrix}=1.

eigenfaces

[6 Eigenfaces for the training set chosen from the MIT-CBCL database, these are not in any order]

9. Select the best K Eigenvectors, the selection of these Eigenvectors is done heuristically.

_____

Finding Weights:

The Eigenvectors found at the end of the previous section, u_i when converted to a matrix in a process that is reverse to that in STEP 2, have a face like appearance. Since these are Eigenvectors and have a face like appearance, they are called Eigenfaces. Sometimes, they are also called as Ghost Images because of their weird appearance.

Now each face in the training set (minus the mean), \Phi_i can be represented as a linear combination of these Eigenvectors u_i:

\Phi_i = \sum_{j=1}^{K}w_ju_jm, where u_j ‘s are Eigenfaces.

These weights can be calculated as :

w_j = u_j^T\Phi_i.

Each normalized training image is represented in this basis as a vector.

\Omega_i = \begin{bmatrix}w_1\w_2\ \vdots \w_k\end{bmatrix}

untitledwhere i = 1,2… M. This means we have to calculate such a vector corresponding to every image in the training set and store them as templates.

_____

Recognition Task:

Now consider we have found out the Eigenfaces for the training images , their associated weights after selecting a set of most relevant Eigenfaces and have stored these vectors corresponding to each training image.

If an unknown probe face \Gamma is to be recognized then:

1. We normalize the incoming probe \Gamma as \Phi = \Gamma - \Psi.

2. We then project this normalized probe onto the Eigenspace (the collection of Eigenvectors/faces) and find out the weights.

w_i = u_i^T\Phi.

3. The normalized probe \Phi can then simply be represented as:

\Omega = \begin{bmatrix}w_1\w_2\vdots\w_K\end{bmatrix}

untitled

After the feature vector (weight vector) for the probe has been found out, we simply need to classify it. For the classification task we could simply use some distance measures or use some classifier like Support Vector Machines (something that I would cover in an upcoming post). In case we use distance measures, classification is done as:

Find e_r = min\begin{Vmatrix}\Omega - \Omega_i\end{Vmatrix}. This means we take the weight vector of the probe we have just found out and find its distance with the weight vectors associated with each of the training image.

And if e_r < \Theta, where \Theta is a threshold chosen heuristically, then we can say that the probe image is recognized as the image with which it gives the lowest score.

If however e_r > \Theta then the probe does not belong to the database. I will come to the point on how the threshold should be chosen.

For distance measures the most commonly used measure is the Euclidean Distance. The other being the Mahalanobis Distance. The Mahalanobis distance generally gives superior performance. Let’s take a brief digression and look at these two simple distance measures and then return to the task of choosing a threshold.

_____

Distance Measures:

Euclidean Distance: The Euclidean Distance is probably the most widely used distance metric. It is a special case of a general class of norms and is given as:

\displaystyle\begin{Vmatrix}x-y\end{Vmatrix}_e = \displaystyle\sqrt{\begin{vmatrix}x_i-y_i\end{vmatrix}^2}

The Mahalanobis Distance: The Mahalanobis Distance is a better distance measure when it comes to pattern recognition problems. It takes into account the covariance between the variables and hence removes the problems related to scale and correlation that are inherent with the Euclidean Distance. It is given as:

d(x,y) =\sqrt{ (x-y)^TC^{-1}(x-y)}

Where C is the covariance between the variables involved.

_____

Deciding on the Threshold:

Why is the threshold, \Theta important?

Consider for simplicity we have ONLY 5 images in the training set. And a probe that is not in the training set comes up for the recognition task. The score for each of the 5 images will be found out with the incoming probe. And even if an image of the probe is not in the database, it will still say the probe is recognized as the training image with which its score is the lowest. Clearly this is an anomaly that we need to look at. It is for this purpose that we decide the threshold. The threshold \Theta is decided heuristically.

non-face_score

Click to Enlarge

Now to illustrate what I just said, consider a simpson image as a non-face image, this image will be scored with each of the training images. Let’s say S_4 is the lowest score out of all. But the probe image is clearly not beloning to the database. To choose the threshold we choose a large set of random images (both face and non-face), we then calculate the scores for images of people in the database and also for this random set and set the threshold \Theta (which I have mentioned in the “recognition” part above) accordingly.

_____

More on the Face Space:

To conclude this post, here is a brief discussion on the face space.

face_space[Image Source – [1]]

Consider a simplified representation of the face space as shown in the figure above. The images of a face, and in particular the faces in the training set should lie near the face space. Which in general describes images that are face like.

The projection distance e_r should be under a threshold \Theta as already seen. The images of known individual fall near some face class in the face space.

There are four possible combinations on where an input image can lie:

1. Near a face class and near the face space : This is the case when the probe is nothing but the facial image of a known individual (known = image of this person is already in the database).

2. Near face space but away from face class : This is the case when the probe image is of a person (i.e a facial image), but does not belong to anybody in the database i.e away from any face class.

3. Distant from face space and near face class : This happens when the probe image is not of a face however it still resembles a particular face class stored in the database.

4. Distant from both the face space and face class: When the probe is not a face image i.e is away from the face space and is nothing like any face class stored.

Out of the four, type 3 is responsible for most false positives. To avoid them, face detection is recommended to be a part of such a system.

_____

References and Important Papers

1.Face Recognition Using Eigenfaces, Matthew A. Turk and Alex P. Pentland, MIT Vision and Modeling Lab, CVPR ’91.

2. Eigenfaces Versus Fischerfaces : Recognition using Class Specific Linear Projection, Belhumeur, Hespanha, Kreigman, PAMI ’97.

3. Eigenfaces for Recognition, Matthew A. Turk and Alex P. Pentland, Journal of Cognitive Neuroscience ’91.

_____

Related Posts:

1. Face Recognition/Authentication using Support Vector Machines

2. Why are Support Vector Machines called so

_____

MATLAB Codes:

I suppose the MATLAB codes for the above are available on atleast 2-3 locations around the internet.

However it might be useful to put out some starter code. Note that the variables have the same names as those in the description above.You may use this mat file for testing. These are the labels for that mat file.

%This is the starter code for only the part for finding eigenfaces.
load('features_faces.mat'); % load the file that has the images.
%In this case load the .mat filed attached above.
% use reshape command to see an image which is stored as a row in the above.

% The code is only skeletal.
% Find psi - mean image
Psi_train = mean(features_faces')';
% Find Phi - modified representation of training images.
% 548 is the total number of training images.
for i = 1:548
    Phi(:,i) = raw_features(:,i) - Psi_train;
end
% Create a matrix from all modified vector images
A = Phi;
% Find covariance matrix using trick above
C = A'*A;
[eig_mat, eig_vals] = eig(C);
% Sort eigen vals to get order
eig_vals_vect = diag(eig_vals);
[sorted_eig_vals, eig_indices] = sort(eig_vals_vect,'descend');
sorted_eig_mat = zeros(548);
for i=1:548
    sorted_eig_mat(:,i) = eig_mat(:,eig_indices(i));
end
% Find out Eigen faces
Eig_faces = (A*sorted_eig_mat);
size_Eig_faces=size(Eig_faces);
% Display an eigenface using the reshape command.

% Find out weights for all eigenfaces
% Each column contains weight for corresponding image
W_train = Eig_faces'*Phi;
face_fts = W_train(1:250,:)'; % features using 250 eigenfaces.

_____

Onionesque Reality Home >>

Read Full Post »

In a number of seminars at a lot of universities or industry interactions one of the hot topics these days is efficient wireless power transfer and the pressing need and desirability of it. It is even more interesting given that wireless power is nothing new at all. One of the earliest patents in this area was given in 1900 to the legendary Nikola Tesla (Patent number: 649621) and there has been a discussion on it ever since. Probably now is the time to really realize Tesla’s vision with the number of devices of daily usage growing rapidly.

[Left Nikola Tesla | Right Alanson Sample, Intel Engineer Demonstrating WREL (Image Source)]

Intel has been at present working on what they call the Wireless Resonant Energy Link, which is based on the work of some MIT physicists. In the image above, an Intel engineer is seen demonstrating powering of a 60 W bulb wirelessly. Doing so requires more power than what is needed to charge a laptop. The implications of this technology can be immense. However the adverse effects of such technology on humans remain to be seen but are not viewed as a major impediment to its development.

Another area that is being discussed extensively these days is claytronics, catoms or simply programmable matter. Let’s take a brief digression into this before coming back to the original topic.

Claytronics: Claytronics seems to be one of the most futuristic and promising application areas of the intersection of Robotics, Swarm Intelligence and Computer Science among others. Claytronics is a field concerning reconfigurable nanoscale robots (which are called Claytronic Atoms or Catoms) which can operate as a swarm and can be desgined to form much more complex elements and perform complex tasks. These sub-millmeter computers eventually would have the ability to move around, communicate with other computers, and even electrostatically attach to each other to allow the swarm to take different shapes.

Catoms also referred to as programmable matter could reconfigure to form almost any shape, take any color or texture. Some interesting speculations include that catoms could be morphed to form replicas of humans (for virtual meetings) as well.  For a brief initiating idea have a look at the video below:

Work on this has been done by Prof Seth Goldstein and his group at Carnegie Mellon and is still on under the name the Claytronics Project. This work has been expanded upon by Intel researchers.

A senior researcher at Intel Jason Campbell has the following to say on just SOME possibilities that we could have in the future from programmable matter.

Think of a mobile device, My cell phone is too big to fit comfortably in my pocket and too small for my fingers. It’s worse if I try to watch movies or do my e-mail. But if I had 200 to 300 milliliters of catoms, I could have it take on the shape of the device that I need at that moment. For example, the catoms could be manipulated to create a larger keypad for text messaging. And when the device wasn’t being used, I could command it to form its smallest shape or even be a little squishy, so I can just drop it in my pocket.

Battery Powered Robots, An impediment to Research in SI based Robotic Systems:

There has been a lot of research going on swarm robotics. Taking just two examples, consider the work of James McLurkin of the CSAIL, MIT and the work at Ecole Polytechnique Federale de Lausanne (EPFL) in Lausanne, Switzerland. A lot of James’ work can be seen here, with a number of videos and papers available for download.

In the video below, a swarm of 278 miniature e-puck robots move around. All of them are battery powered. Battery powered robots can not only be a headache but a severe research impediment as the size of the swarm increases.

It thus would be very desirable that the swarm is wirelessly powered.

So, in short a lot of work is being done in the above two fields but what is also required is an intersection of the two, and this is exactly what Travis Deyle of Georgia Tech and Dr Matt Reynolds of Duke have done. Their work, Surface based wireless power transmission and bidirectional communication for autonomous robot swarms. presented at the IEEE ICRA this year details the construction of a 60cmx60cm surface that provides wireless power and bi-directional communication to an initial swarm of 5 line following robots. Each robot had a power consumption of about 200 mW.

[Image Courtesy : Travis Deyle]

An actual robot looks like the following in close up.

Wirelessly Powered Robot Swarm from Travis on Vimeo.

For more extensive details about the setup and circuit details have a look at their paper and the presentation slides.

Related Posts:

Morphogenesis and Swarm Robotics

Onionesque Reality Home >>

Read Full Post »

Well just a fortnight or so back I discovered that Dr Radford Neal, one of the top researchers in Statistics and Machine Learning was blogging. And today morning I discovered Dr Vitorino Ramos has been blogging for over a week now too!

This comes as a surprise, but a very pleasant one. I am very glad to have found his page, it promises to be a very different Web-Log and could indeed grow into one of the top blogs on Swarming, Self-Organization, Complexity and Distributed Systems as it would be by a leading expert in the field. It would be great to catch up on his work. In the past I have tried to write on some of his interesting work on my own page. My posts can be found here.

[Vitorino Ramos: Image Source]

Dr Ramos’ research areas are chiefly in Artificial Life, Artificial Intelligence, Bio-Inspired Computing, Collective Intelligence and Complex Systems. He obtained his PhD in 2004 and has published about 70 papers in the above fields and their broad application areas. So put simply it can be said that the IQ of the “blogosphere” has gone up a little with this addition.

For starters I would recommend his article on Financial Markets (given the situation today), talking about the herd mentality and the resulting amplification in dumb investors and its results and what it could result in. Most investors do not understand much of the market mechanism. This is a bare fact put most aptly in this cartoon I found on his blog, and his post goes much beyond that.

[Image Source]

Click to Enlarge

And going by the website and blog name, it seems that Dr Ramos is now interested in some sense in Tibor Ganti’s Chemoton Theory.

Quick Links:

1. Vitorino Ramos’ Homepage.

2. Dr Ramos’ Publications. (PDFs available online)

Onionesque Reality Home >>

Read Full Post »

Older Posts »