Not a technical post. Though certainly an apparition of one.
Here are a few books that I have read/am currently reading/plan to read over this summer with a short blurb (a review would take too long) about each of them. I decided to share as I thought some of them might be of more general interest.
1. The Minimum Description Length Principle – Peter Grünwald (Currently Reading – About 40% completed)
Plurality should not be posited without necessity
So goes the principle of parsimony of William of Occam, more commonly called the Occam’s Razor. Friar Occam’s maxim is certainly an observation and not a law, however its universality is unquestioned. The most striking examples being the quest by mathematicians for the most elegant and simple proofs and by physicists for aesthetics and simplicity in theories of nature given multiple competing possibilities.
Looked at more deeply, Occam’s Razor is perhaps the cornerstone of an intricate relationship between Machine Learning and Information Theory, which are essentially two sides of the same coin. At first this is a little difficult to understand. However a closer look at the maxim makes things clearer. What it states essentially is: Given a set of hypothesis that explain some “data” equally well, pick up the hypothesis that has the smallest description (is the simplest). In other words, pick up the hypothesis that achieves maximum compression of the data.
Inductive inference is essentially the task of learning patterns (think formulating laws about your observations) given some observed data and then using what has been learned to make predictions about the future (or unseen data). The more patterns we are able to find in some observations, the more we are able to compress it. And thus the more we are able to compress the data, the more we have learned about it.
The Minimum Description Length principle is a formalization of the Occam’s Razor and is one of the most beautiful and powerful methods for inductive inference. I have written a little about this in a previous post on Ray Solomonoff (the section on the universal distribution). Solomonoff introduced the idea of Kolmogorov Complexity that essentially is this notion.
Half way into it, I can say that Prof. Grünwald’s book is perhaps the best treatise on the topic. It starts off with a gentle introduction to the idea of MDL and then takes a lot of space to describe preliminaries from Information Theory and Probability. The book becomes a slower read as one moves farther into it which is understandable. It actually is one of the best books that I have come across that takes the basics and builds very powerful theorems from them without appearing like doing black magic. One must commend Dr. Grünwald on this feat.
I have not been completely alien to the idea of a MDL and even the introductory chapters had a lot to give to me. The book covers in depth the idea of crude MDL, Kolmogorov Complexity, refined MDL and using these ideas in inference with a special focus on model selection.
Objective: A couple of months ago, one of my papers got rejected at a top data-mining conference. The reviews however were pretty encouraging, with one of the reviewers stating that the technique so discussed had the potential to be influential amongst the data mining community at large. However, it seems that the paper lost out due a lack of theoretical justification for why and when it worked, and when it wouldn’t. The experimental evidence was substantial but was clearly not enough to convince the reviewers otherwise.
I have been trying to outline the contours for a proof over the past month though that is not officially what I am doing. I conceive that the proof would need a use of the Minimum Description Length principle. This comes as an opportunity as I had always wanted to know more of the MDL principle ever since I got introduced to Machine Learning.
My present work seems to have settled around trying to know more about Incrementral Spectral Clustering  (and more efficient mapping of new test points to existing clusters). It is noteworthy that most spectral clustering methods are offline methods and addition of new points is rather difficult. Some weeks ago I was trying to understand the use of the Nyström method for this problem.
I will write a post on spectral clustering in the future. But perhaps there is some merit in discussing it a little.
The basic idea of clustering is the following:
Suppose you have a set of distributions , and that each distribution has an associated weight such that . Then any dataset could be assumed to have been generated by sampling these distributions, each with a probability equal to their associated weight. The task of clustering is then to identify these distributions. Methods such as Expectation Maximization work with trying to estimate a mixture model. k-means clustering further simplifies the case by assuming that these distributions are simply a mixture of spherical Gaussians.
Trying to model such explicit models of the data is the cause of failure of these methods on many real world datasets that might not have been generated by such distributions (which is most likely the case). Spectral Clustering on the other hand is a metric modification method (essentially a manifold method) that changes the data representation and on this new representation finding natural clusters is much easier.
To do so, the data points are connected as a graph, and we work with the spectra (Eigenvectors) of the graph Laplacian matrix (the laplacian measures the “flow” thus giving a deeper understanding of the data). This is a very beautiful idea (more on a detailed post). Why spectral clustering has become so popular in a short time is not just because it works well, but also because there is a solid theoretical basis for it.
Objective: The above lines sums up the need to read this book. This book is more on Spectral Graph Theory and has a small section on clustering. However it covers many basics that are necessary if one is to aspire to make a contribution to the area. Such as: While working with the Nystrom Method I found myself at a loss to understand the basis for using the Frobenius Norm amongst many others and had to take them on faith. Many other aspects that appeared like black magic earlier just seem like beautiful ideas after reading the book.
The book covers the basics of Graphs to some extent and also of the needed algebra. It then describes the various ideas associated with the spectra of graphs and some applications (such as Google’s PageRank). The second half of the book is beyond my scope at the moment, but it is certainly something that would be of great interest to the more advanced reader.
3. The Symmetries of Things – John Conway (Finished Reading)
I must confess that I was initially disappointed with this book. I had picked this up after weighing it with Prof. David Mumford’s book — Indra’s Pearls: The Vision of Felix Klein . The disappointment perhaps was more given that this book was expensive ($57) and was written by John Conway! Luckily (I suppose) a two day flight delay facilitated a second reading of the book and I changed my mind. And now think this is a brilliant book (not to mention very beautiful with the hardcover and over 1000 illustrations in color).
Given the number of ideas in the book, I will write about this it (and some of the ideas) in a different blog article (which has been in the drafts for over two weeks already!).
4. The Beginning of Infinity- David Deutsch (To Read)
The Beginning of Infinity is Quantum Computing pioneer David Deutsch’s most recent book. Highly recommended to me by many researchers I have a lot of respect for, I am yet to begin reading it. Here is the official product description for the book:
“This is a bold and all-embracing exploration of the nature and progress of knowledge from one of today’s great thinkers. Throughout history, mankind has struggled to understand life’s mysteries, from the mundane to the seemingly miraculous. In this important new book, David Deutsch, an award-winning pioneer in the field of quantum computation, argues that explanations have a fundamental place in the universe. They have unlimited scope and power to cause change, and the quest to improve them is the basic regulating principle not only of science but of all successful human endeavor. This stream of ever improving explanations has infinite reach, according to Deutsch: we are subject only to the laws of physics, and they impose no upper boundary to what we can eventually understand, control, and achieve. In his previous book, “The Fabric of Reality”, Deutsch describes the four deepest strands of existing knowledge – the theories of evolution, quantum physics, knowledge, and computation-arguing jointly they reveal a unified fabric of reality. In this new book, he applies that worldview to a wide range of issues and unsolved problems, from creativity and free will to the origin and future of the human species. Filled with startling new conclusions about human choice, optimism, scientific explanation, and the evolution of culture, “The Beginning of Infinity” is a groundbreaking book that will become a classic of its kind.”
Here are some reviews of the book.
5. The Original of Laura – Vladimir Nabokov (Finished Reading)
“Oh you must! said Winny. It is of course fictionalized and all that, but you’ll come face to face with yourself at every corner. And there’s your wonderful death. Let me show you your wonderful death”
This piece is not for you if you are not already Nabokovian. For this novel is truly unfinished. Being a perfectionist, Nabokov had requested it to be destroyed had he not been able to finish it. And on reading a few cards, one realizes immediately why. It is quite unlike many of his other works, unpolished and raw. The broad contours of the storyline are certainly discernible, however reading it is like wandering inside a mobius labyrinth . I had been waiting with bated breath to get the time to read his final work and would confess I was mildly disappointed. I wondered if his son did the right thing by going against his father’s will by getting it published. The package (hardcover, print, cards of the novel in Nabokov’s writing) is exquisite and is definitely a collectors item. However there is nothing more to it. If you’ve never read anything by Nabokov and want to, then I’d point you to Pale Fire. However, if you ARE Nabokovian, and are willing to spend a little, then this book should certainly find a place in your book shelf.
6. Immortality – Milan Kundera (Finished Reading)
There is a certain part of all of us that lives outside of time. Perhaps we become aware of our age only at exceptional moments and most of the time we are ageless.
I have always thought the the true age of an individual was a mental thing and that the actual age did not matter. I have had the good fortune to meet an Indian mathematician who had the chance to work with Israel Gelfand. It is quite well known that some of the deepest contributions in mathematics and science have been made by young men and women. After a certain age the accumulated prejudices make them too conservative to make a revolutionary contribution. Quoting Freeman Dyson:
… the history of mathematics is a history of horrendously difficult problems being solved by young people too ignorant to know that they were impossible. — Freeman Dyson, “Birds and Frogs”
Israel Gelfand was a prime example of showing us otherwise. He kept making significant contributions till a ripe old age. In that sense Gelfand was ageless. Ofcourse I am not even trying to talk about the name of Gelfand that has already become immortal.
Milan Kundera’s Immortality further refined my thinking of what was meant by Immortal. It starts off with the narrator describing an old woman learning how to swim. A gesture sprung up by the woman after the lesson was more that of a young girl and takes him by surprise. “At the time, that gesture aroused in me immense, inexplicable nostalgia, and this nostalgia gave birth to the woman I call Agnes.” However the author reasons rather paradoxically that the reason might be something else. “there are fewer gestures in the world than there are individuals,” therefore “a gesture is more individual than an individual.” Hence when Agnes dies it does not disturb the author greatly. This reminded me of the last lines from the critically acclaimed Hindi Movie Saaransh about the continuity of life. In any case, from the observation in the start, Kundera weaves out a number of stories and explores a number of themes very beautifully.
This novel is quite unlike The Unbearable Lightness of Being that I had the chance to read a few years ago. That novel explored the life of intellectuals and artists after the Prague spring (and the consequent soviet invasion) of 1968. It had a definite plot and explored a definite notion: The notion of Eternal Recurrence of Nietzsche. There is no definiteness to Immortality however, and it almost redefines what is a novel, almost bordering on being avant-garde towards the end. This is the book I would suggest at the moment to anyone if I were asked. Five stars!
7. The Unfolding of Language: An Evolutionary Tour of Mankind’s Greatest Invention — Guy Deutscher (To Read)
Again a book that I have been meaning to read for a while but get distracted to other books. Click on the image above to learn more about the book!
Here a couple of technical books that I want to attempt reading after the summer.
1. Pattern Theory by David Mumford and Agnès Desolneux.
2. Algebraic Geometry and Statistical Learning Theory by Sumio Watanabe.
1. Occam’s Razor – Blumer, Ehrenfeucht, Haussler, Warmuth, Information Processing Letters 24 (1987) 377-380.
2. A Tutorial on Spectral Clustering – Ulrike von Luxberg, Statistics and Computing, 17 (4) 2007.