Feeds:
Posts

## A Note on the Graph Laplacian

A short note on how the Graph Laplacian is a natural analogue of the Laplacian in vector analysis.

For a twice differentiable function $f$ on the euclidean space, the Laplacian of $f$, $\Delta f$ is the div(grad($f$)) or the divergence of the gradient of $f$. Let’s consider the laplacian in two dimensions:

$\displaystyle \Delta f(x,y) = \nabla \cdot \nabla f(x,y) = \frac{d^2 f(x,y)}{dx^2} + \frac{d^2 f(x,y)}{dy^2}$

To get an intuition about what it does, it is useful to consider a finite difference approximation to the above:

$\displaystyle \Delta f(x,y) \approx \frac{f(x+h,y) + f(x-h,y) + f(x,y+h) + f(x,y-h) - 4f(x,y)}{h^2}$

The above makes it clear that the Laplacian behaves like a local averaging operator. Infact the above finite difference approximation is used in image processing (say, when h is set to one pixel) to detect sharp changes such as edges, as the above is close to zero in smooth image regions.

____________________________

The finite difference approximation intuitively suggests that a discrete laplace operator should be natural. At the same time, the Graph Theory literature defines the Graph Laplacian as the matrix $L = D - A$ (where $D$ is the degree matrix and $A$ the adjacency matrix of the graph $G$, both are defined below). It is not obvious that this matrix $L$ follows from the div(grad(f)) definition mentioned above. The purpose of this note is to explicate in short on how this is natural, once the notion of gradient and divergence operators have been defined over a graph.

First of all the two most natural matrices that can be associated with a graph are defined below:

____________________________

There are two natural matrices that are used to specify a simple graph $G = (V,E)$ with vertex set $V =\{1, \dots, n \}$ and edge set $E = \{1, \dots, m \}$: The Adjacency matrix $A$ and the Incidence matrix $\nabla$.

The Adjacency matrix is defined as a $n \times n$ matrix such that:

$\displaystyle A_{u,v} = \begin{cases} 1 & \mbox{if } uv \in E \\ 0 & \mbox{otherwise} \end{cases}$

To define the $m \times n$ incidence matrix $\nabla$, we need to fix an orientation of the edges (for our purpose an arbitrary orientation suffices), then:

$\displaystyle \nabla_{e,v} = \begin{cases} -1 & \text{if } v \text{ is initial vertex of edge } e \\ +1 & \text{if } v \text{ is terminal vertex of edge } e \\ 0 & \text{if } e \text{ is not incident on vertex } v \end{cases}$

Now suppose we have a real valued function $f$ over $V$, $f: V \to \mathbb{R}$. For $G$, we view this as a column vector indexed by the vertices.

Similarly let $g$ be a real valued function over the edges $E$, $g: E \to \mathbb{R}$ (again, we view this as a row vector indexed by the edges).

Now, consider the following map: $f \mapsto \nabla f$ (which is now a vector indexed by $E$). The value of this map at any edge $(\nabla f) (e)$ is simply the difference of the values of $f$ at the two end points of the edge $e$ i.e. $f_u - f_v$. This makes it somewhat intuitive that the matrix $\nabla$ is some sort of a difference operator on $G$. Given this “difference operator” or “discrete differential” view, $(\nabla f)$ can then be viewed as the gradient that measures the change of the function $f$ along the edges of $G$. Thus the map $f \mapsto \nabla f$ is just the gradient operator.

Like the above, now we consider the following map: $g \mapsto \nabla^{T} g^T$ (which is now a vector indexed by $V$). Recall that $g$ was defined on the edges. The value of this map at any vertex $(g \nabla)(v)$ is $\displaystyle \sum_{e \text{ exits } v} g_e - \sum_{e \text{ enters } v} g_e$. If we were to view $g$ as describing some kind of a flow on the edges, then $(g \nabla)(v)$ is the net outbound flow on the vertex $v$, which is just the divergence. Thus the map $f \mapsto g \nabla$ is just the divergence operator.

Now consider $f \mapsto \nabla^{T} \nabla f$ (note that this makes sense since $\nabla f$ is a column vector), going by the above, this should be the divergence of the gradient. Thus, the analogy with the “real” Laplacian makes sense and the matrix $L_G = \nabla^{T} \nabla$ is appropriately called the Graph Laplacian.

Using the definition of $\nabla$, a simple computation yields the following for $L_G$:

$\displaystyle L_{G{u,v}} = \begin{cases} -1 & \mbox{if } uv \in E \\ deg(u) & \mbox{if } u =v \end{cases}$

Looking at the above and recalling the definition for the adjacency matrix $A$, it is easy to see that $L_G = D - A$, where $D$ is the diagonal matrix with $D_{u,u} = deg(u)$. This is just the familiar definition that I mentioned at the start.

PS: From the above it is also clear that the Laplacian is positive semidefinite. Indeed, consider $f ^TL f$, which is written as $f^T \nabla^T \nabla f =\|\nabla f\|^2 = \sum_{(u,v) \in E} (f_u - f_v)^2$

## Implementation and Abstraction in Mathematics

I recently noticed on arXiv that the following manuscript “Implementation and Abstraction in Mathematics” by David McAllester. A couple of years ago, I had taken a graduate course taught by David that had a similar flavour (the material in the manuscript is more advanced, in particular the main results, not to mention it is better organized and the presentation more polished), presenting a type theoretic foundation of mathematics. Although I can’t say I did very well in the course, I certainly enjoyed the ideas in it very much, and thus the above manuscript might be worth a look. Perhaps it might be a good idea to audit that course again, just to make sure I understand the main ideas better this time. :)

## Some Proofs of the Cauchy-Schwarz Inequality

Over the past 4-5 months whenever there is some time to spare, I have been working through The Cauchy-Schwarz Master Class by J. Michael Steele. And, although I am still left with the last two chapters, I have been reflecting on the material already covered in order to get a better perspective on what I have been slowly learning over the months. This blog post is a small exercise in this direction.

Ofcourse, there is nothing mysterious about proving the Cauchy-Schwarz inequality; it is fairly obvious and basic. But I thought it still might be instructive (mostly to myself) to reproduce some proofs that I know out of memory (following a maxim of my supervisor on a white/blackboard blogpost). Although, why Cauchy-Schwarz keeps appearing all the time and what makes it so useful and fundamental is indeed quite interesting and non-obvious. And like Gil Kalai notes, it is also unclear why is it that it is Cauchy-Schwarz which is mainly useful. I feel that Steele’s book has made me appreciate this importance somewhat more (compared to 4-5 months ago) by drawing to many concepts that link back to Cauchy-Schwarz.

Before getting to the post, a word on the book: This book is perhaps amongst the best mathematics book that I have seen in many years. True to its name, it is indeed a Master Class and also truly addictive. I could not put aside the book completely once I had picked it up and eventually decided to go all the way. Like most great books, the way it is organized makes it “very natural” to rediscover many susbtantial results (some of them named) appearing much later by yourself, provided you happen to just ask the right questions. The emphasis on problem solving makes sure you make very good friends with some of the most interesting inequalities. The number of inequalities featured is also extensive. It starts off with the inequalities dealing with “natural” notions such as monotonicity and positivity and later moves onto somewhat less natural notions such as convexity. I can’t recommend this book enough!

Now getting to the proofs: Some of these proofs appear in Steele’s book, mostly as either challenge problems or as exercises. All of them were solvable after some hints.

________________

Proof 1: A Self-Generalizing proof

This proof is essentially due to Titu Andreescu and Bogdan Enescu and has now grown to be my favourite Cauchy-Schwarz proof.

We start with the trivial identity (for $a, b \in \mathbb{R}, x >0, y>0$):

Identity 1: $(ay - bx)^2 \geq 0$

Expanding we have

$a^2y^2 + b^2x^2 - 2abxy \geq 0$

Rearranging this we get:

$\displaystyle \frac{a^2y}{x} + \frac{b^2x}{y} \geq 2ab$

Further: $\displaystyle a^2 + b^2 + \frac{a^2y}{x} + \frac{b^2x}{y} \geq (a+b)^2$;

Rearranging this we get the following trivial Lemma:

Lemma 1: $\displaystyle \frac{(a+b)^2}{(x+y)} \leq \frac{a^2}{x} + \frac{b^2}{y}$

Notice that this Lemma is self generalizing in the following sense. Suppose we replace $b$ with $b + c$ and $y$ with $y + z$, then we have:

$\displaystyle \frac{(a+b+c)^2}{(x+y+z)} \leq \frac{a^2}{x} + \frac{(b+c)^2}{y+z}$

But we can apply Lemma 1 to the second term of the right hand side one more time. So we would get the following inequality:

$\displaystyle \frac{(a+b+c)^2}{(x+y+z)} \leq \frac{a^2}{x} + \frac{b^2}{y} + \frac{c^2}{z}$

Using the same principle $n$ times we get the following:

$\displaystyle \frac{(a_1+a_2+ \dots a_n)^2}{(x_1+x_2+ \dots + x_n)} \leq \frac{a_1^2}{x_1} + \frac{a_2^2}{x_2} + \dots + \frac{a_n^2}{x_n}$

Now substitute $a_i = \alpha_i \beta_i$ and $x_i = \beta_i^2$ to get:

$\displaystyle ( \sum_{i=1}^n \alpha_i \beta_i )^2 \leq \sum_{i=1}^n ( \alpha_i )^2 \sum_{i=1}^n ( \beta_i )^2$

This is just the Cauchy-Schwarz Inequality, thus completing the proof.

________________

Proof 2: By Induction

Again, the Cauchy-Schwarz Inequality is the following: for $a, b \in \mathbb{R}$

$\displaystyle ( \sum_{i=1}^n a_i b_i )^2 \leq \sum_{i=1}^n ( a_i )^2 \sum_{i=1}^n ( b_i )^2$

For proof of the inequality by induction, the most important thing is starting with the right base case. Clearly $n = 1$ is trivially true, suggesting that it is perhaps not of much import. So we consider the case for $n = 2$. Which is:

$\displaystyle ( a_1 b_1 + a_2 b_2 )^2 \leq ( a_1^2 + a_2^2 ) (b_1^2 + b_2^2)$

To prove the base case, we simply expand the expressions. To get:

$\displaystyle a_1^2 b_1^2 + a_2^2 b_2^2 + 2 a_1 b_1 a_2 b_2 \leq a_1^2 b_1^2 + a_1^2 b_2^2 + a_2^2 b_1^2 + a_2^2 b_2^2$

Which is just:

$\displaystyle a_1^2 b_2^2 + a_2^2 b_1^2 - 2 a_1 b_1 a_2 b_2 \geq 0$

Or:

$\displaystyle (a_1 b_2 - a_2 b_1 )^2 \geq 0$

Which proves the base case.

Moving ahead, we assume the following inequality to be true:

$\displaystyle ( \sum_{i=1}^k a_i b_i )^2 \leq \sum_{i=1}^k ( a_i )^2 \sum_{i=1}^k ( b_i )^2$

To establish Cauchy-Schwarz, we have to demonstrate, assuming the above, that

$\displaystyle ( \sum_{i=1}^{k+1} a_i b_i )^2 \leq \sum_{i=1}^{k+1} ( a_i )^2 \sum_{i=1}^{k+1} ( b_i )^2$

So, we start from $H(k)$:

$\displaystyle ( \sum_{i=1}^k a_i b_i )^2 \leq \sum_{i=1}^k ( a_i )^2 \sum_{i=1}^k ( b_i )^2$

we further have,

$\displaystyle \Big(\sum_{i=1}^k a_i b_i\Big) + a_{k+1}b_{k+1} \leq \Big(\sum_{i=1}^k ( a_i )^2\Big)^{1/2} \Big(\sum_{i=1}^k ( b_i )^2\Big)^{1/2} + a_{k+1}b_{k+1} \ \ \ \ (1)$

Now, we can apply the case for $n = 2$. Recall that: $\displaystyle a_1 b_1 + a_2 b_2 \leq (a_1^2 + a_2^2)^{1/2} (b_1^2 + b_2^2)^{1/2}$

Thus, using this in the R. H. S of $(1)$, we would now have:

$\displaystyle \Big(\sum_{i=1}^{k+1} a_i b_i\Big) \leq \Big(\sum_{i=1}^k ( a_i )^2 + a_{k+1}^2\Big)^{1/2} \Big(\sum_{i=1}^k ( b_i )^2 + b_{k+1}^2\Big)^{1/2}$

Or,

$\displaystyle \Big(\sum_{i=1}^{k+1} a_i b_i\Big) \leq \Big(\sum_{i=1}^{k+1} ( a_i )^2 \Big)^{1/2} \Big(\sum_{i=1}^{k+1} ( b_i )^2 \Big)^{1/2}$

This proves the case $H(k+1)$ on assuming $H(k)$. Thus also proving the Cauchy-Schwarz inequality by the principle of mathematical induction.

________________

Proof 3: For Infinite Sequences using the Normalization Trick:

Problem: For $a, b \in \mathbb{R}$ If $\displaystyle \Big(\sum_{i=1}^{\infty} a_i^2 \Big) < \infty$ and $\displaystyle \Big(\sum_{i=1}^{\infty} b_i^2 \Big) < \infty$ then is $\displaystyle \Big(\sum_{i=1}^{\infty} |a_i||b_i|\Big) < \infty$ ?

Note that this is easy to establish. We simply start with the trivial identity $(x-y)^2 \geq 0$ which in turn gives us $\displaystyle xy \leq \frac{x^2}{2} + \frac{y^2}{2}$

Next, take $x = |a_i|$ and $y = |b_i|$ on summing up to infinity on both sides, we get the following:

$\displaystyle \Big( \sum_{i=1}^{\infty} |a_i||b_i| \Big)^2 \leq \frac{1}{2}\Big(\sum_{i=1}^{\infty} a_i^2\Big) + \frac{1}{2} \Big(\sum_{i=1}^{\infty} b_i^2\Big)\ \ \ \ \ \ \ \ (2)$

From this it immediately follows that

$\displaystyle \Big(\sum_{i=1}^{\infty} |a_i||b_i|\Big) < \infty$

Now let

$\displaystyle \hat{a}_i = \frac{a_i}{\Big(\sum_j a_i^2\Big)^{1/2}}$ and

$\displaystyle \hat{b}_i = \frac{b_i}{\Big(\sum_j b_i^2\Big)^{1/2}}$; substituting in $(2)$, we get:

$\displaystyle \Big( \sum_{i=1}^{\infty} |\hat{a}_i||\hat{b}_i| \Big)^2 \leq \frac{1}{2}\Big(\sum_{i=1}^{\infty} \hat{a}_i^2\Big) + \frac{1}{2} \Big(\sum_{i=1}^{\infty} \hat{b}_i^2\Big)$ or,

$\displaystyle \Bigg(\sum_{i = 1}^{\infty}\frac{a_i}{\Big(\sum_j a_j^2\Big)^{1/2}}\frac{b_i}{\Big(\sum_j b_j^2\Big)^{1/2}}\Bigg)^2 \leq \frac{1}{2} + \frac{1}{2}$

Which simply gives back Cauchy’s inequality for infinite sequences thus completing the proof:

$\displaystyle \Big(\sum_{i=1}^{\infty} a_i b_i\Big)^2 \leq \Big(\sum_{i=1}^{\infty}a_i^2\Big) \Big(\sum_{i=1}^{\infty}b_i^2\Big)$

________________

Proof 4: Using Lagrange’s Identity

We first start with a polynomial which we denote by $\mathbf{Q}_n$:

$\mathbf{Q}_n = \big(a_1^2 + a_2^2 + \dots + a_n^2 \big) \big(b_1^2 + b_2^2 + \dots + b_n^2 \big) - \big(a_1b_1 + a_2b_2 + \dots + a_nb_n\big)^2$

The question to now ask, is $\mathbf{Q}_n \geq 0$? To answer this question, we start of by re-writing $\mathbf{Q}_n$ in a “better” form.

$\displaystyle \mathbf{Q}_n = \sum_{i=1}^{n}\sum_{j=1}^{n} a_i^2 b_j^2 - \sum_{i=1}^{n}\sum_{j=1}^{n} a_ib_i a_jb_j$

Next, as J. Michael Steele puts, we pursue symmetry and rewrite the above so as to make it apparent.

$\displaystyle \mathbf{Q}_n = \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_i^2 b_j^2 + a_j^2 b_i^2\big) - \frac{2}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} a_ib_i a_jb_j$

Thus, we now have:

$\displaystyle \mathbf{Q}_n = \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_i b_j - a_j b_i\big)^2$

This makes it clear that $\mathbf{Q}_n$ can be written as a sum of squares and hence is always postive. Let us write out the above completely:

$\displaystyle \sum_{i=1}^{n}\sum_{j=1}^{n} a_i^2 b_j^2 - \sum_{i=1}^{n}\sum_{j=1}^{n} a_ib_ib_jb_j = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_ib_j -a_jb_i\big)^2$

Now, reversing the step we took at the onset to write the L.H.S better, we simply have:

$\displaystyle \sum_{i}^{n} a_i^2 \sum_{}^{n} b_i^2 - \big(\sum_{i=1}^n a_ib_i\big)^2 = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_ib_j -a_jb_i\big)^2$

This is called Lagrange’s Identity. Now since the R.H.S. is always greater than or equal to zero. We get the following inequality as a corrollary:

$\displaystyle \big(\sum_{i=1}^n a_ib_i\big)^2 \leq \sum_{i}^{n} a_i^2 \sum_{}^{n} b_i^2$

This is just the Cauchy-Schwarz inequality, completing the proof.

________________

Proof 5: Gram-Schmidt Process gives an automatic proof of Cauchy-Schwarz

First we quickly review the Gram-Schmidt Process: Given a set of linearly independent elements of a real or complex inner product space $\big(V,\langle\cdot, \cdot\rangle\big)$, $\mathbf{x_1}, \mathbf{x_2}, \dots, \mathbf{x_n}$. We can get an orthonormal set of $n$ elemets $\mathbf{e_1}, \mathbf{e_2}, \dots, \mathbf{e_n}$ by the simple recursion (after setting $\displaystyle \mathbf{e_1 = \frac{\mathbf{x_1}}{\|x_1 \|}}$).

$\displaystyle \mathbf{z_k} = \mathbf{x_k} - \sum_{j=1}^{k-1} \langle\mathbf{x_k},\mathbf{e_j}\rangle\mathbf{e_j}$ and then

$\displaystyle \mathbf{e_k} = \frac{\mathbf{z_k}}{\|\mathbf{z_k}\|}$

for $k = 2, 3, \dots, n$.

Keeping the above in mind, assume that $\| x\| = 1$. Now let $x = e_1$. Thus, we have:

$\mathbf{z} = \mathbf{y} - \langle \mathbf{y}, \mathbf{e_1}\rangle\mathbf{e_1}$

Giving: $\displaystyle \mathbf{e_2} = \frac{\mathbf{z}}{\|\mathbf{z}\|}$. Rearranging we have:

$\displaystyle \|\mathbf{z}\|\mathbf{e_2} = \mathbf{y} - \langle \mathbf{y}, \mathbf{e_1}\rangle\mathbf{e_1}$ or

$\displaystyle \mathbf{y} = \langle \mathbf{y},\mathbf{e_1}\rangle \mathbf{e_1} + \|\mathbf{z}\|\mathbf{e_2}$ or

$\displaystyle \mathbf{y} = \mathbf{c_1} \mathbf{e_1} + \mathbf{c_2}\mathbf{e_2}$ where $c_1, c_2$ are constants.

Now note that: $\displaystyle \langle\mathbf{x},\mathbf{y}\rangle = c_1$ and

$\displaystyle \langle \mathbf{y},\mathbf{y}\rangle = |c_1|^2 + |c_2|^2$. The following bound is trivial:

$\displaystyle |c_1| \leq (|c_1|^2 + |c_2|^2)^{1/2}$. But note that this is simply $\langle x,y \rangle \leq \langle y,y \rangle^{1/2}$

Which is just the Cauchy-Schwarz inequality when $\|x\| = 1$.

________________

Proof 6: Proof of the Continuous version for d =2; Schwarz’s Proof

For this case, the inequality may be stated as:

Suppose we have $S \subset \mathbb{R}^2$ and that $f: S \to \mathbb{R}$ and $g: S \to \mathbb{R}$. Then consider the double integrals:

$\displaystyle A = \iint_S f^2 dx dy$, $\displaystyle B = \iint_S fg dx dy$ and $\displaystyle C = \iint_S g^2 dx dy$. These double integrals must satisfy the following inequality:

$|B| \leq \sqrt{A} . \sqrt{C}$.

The proof given by Schwarz as is reported in Steele’s book (and indeed in standard textbooks) is based on the following observation:

The real polynomial below is always non-negative:

$\displaystyle p(t) = \iint_S \Big( t f(x,y) + g(x,y) \Big)^2 dx dy = At^2 + 2Bt + C$

$p(t) > 0$ unless $f$ and $g$ are proportional. Thus from the binomial formula we have that $B^2 \leq AC$, moreover the inequality is strict unless $f$ and $g$ are proportional.

________________

Proof 7: Proof using the Projection formula

Problem: Consider any point $x \neq 0$ in $\mathbb{R}^d$. Now consider the line that passes through this point and origin. Let us call this line $\mathcal{L} = \{ tx: t \in \mathbb{R}\}$. Find the point on the line closest to any point $v \in \mathbb{R}^d$.

If $P(v)$ is the point on the line that is closest to $v$, then it is given by the projection formula: $\displaystyle P(v) = x \frac{\langle x, v \rangle }{\langle x, x \rangle}$

This is fairly elementary to establish. To find the value of $t$, such that distance $\rho(v,tx)$ is minimized, we can simply consider the squared distance $\rho^2(v,tx)$ since it is easier to work with. Which by definition is:

$\displaystyle \rho^2(v,tx) = \langle v - x, v - tx \rangle$

which is simply:

$\displaystyle \rho^2(v,tx) = \langle v, v \rangle - 2t \langle v, x \rangle + t^2 \langle x, x \rangle$

$\displaystyle = \langle x, x \rangle \bigg( t^2 -2t \frac{\langle v, x \rangle}{\langle x, x \rangle} + \frac{\langle v, v \rangle}{\langle x, x \rangle}\bigg)$

$\displaystyle = \langle x, x \rangle \bigg\{ \bigg(t - \frac{\langle v,x \rangle}{\langle x,x \rangle}\bigg)^2 - \frac{\langle v,x \rangle^2}{\langle x,x \rangle^2}\bigg\} + \frac{\langle v, v \rangle}{\langle x, x \rangle}\bigg)$

$\displaystyle = \langle x, x \rangle \bigg\{ \bigg(t - \frac{\langle v,x \rangle}{\langle x,x \rangle}\bigg)^2 - \frac{\langle v,x \rangle^2}{\langle x,x \rangle^2} + \frac{\langle v, v \rangle}{\langle x, x \rangle} \bigg\}$

$\displaystyle = \langle x, x \rangle \bigg\{ \bigg(t - \frac{\langle v,x \rangle}{\langle x,x \rangle}\bigg)^2 - \frac{\langle v,x \rangle^2}{\langle x,x \rangle^2} + \frac{\langle v, v \rangle \langle x, x \rangle}{\langle x, x \rangle^2} \bigg\}$

So, the value of $t$ for which the above is minimized is $\displaystyle \frac{\langle v,x \rangle}{\langle x,x \rangle}$. Note that this simply reproduces the projection formula.

Therefore, the minimum squared distance is given by the expression below:

$\displaystyle \min_{t \in \mathbb{R}} \rho^2(v, tx) = \frac{\langle v,v\rangle \langle x,x \rangle - \langle v,x \rangle^2}{\langle x,x \rangle}$

Note that the L. H. S is always positive. Therefore we have:

$\displaystyle \frac{\langle v,v\rangle \langle x,x \rangle - \langle v,x \rangle^2}{\langle x,x \rangle} \geq 0$

Rearranging, we have:

$\displaystyle \langle v,x \rangle^2 \leq \langle v,v\rangle \langle x,x \rangle$

Which is just Cauchy-Schwarz, thus proving the inequality.

________________

Proof 8: Proof using an identity

A variant of this proof is amongst the most common Cauchy-Schwarz proofs that are given in textbooks. Also, this is related to proof (6) above. However, it still has some value in its own right. While also giving an useful expression for the “defect” for Cauchy-Schwarz like the Lagrange Identity above.

$P(t) = \langle v - tw, v - tw \rangle$. Clearly $P(t) \geq 0$.

To find the minimum of this polynomial we find its derivative w.r.t $t$ and setting to zero:

$P'(t) = 2t \langle w, w \rangle - 2 \langle v, w \rangle = 0$ giving:

$\displaystyle t_0 = \frac{\langle v, w \rangle}{\langle w, w \rangle}$

Clearly we have $P(t) \geq P(t_0) \geq 0$. We consider:

$P(t_0) \geq 0$, substituting $\displaystyle t_0 = \frac{\langle v, w \rangle}{\langle w, w \rangle}$ we have:

$\displaystyle \langle v,v \rangle - \frac{\langle v, w \rangle}{\langle w, w \rangle} \langle v,w \rangle - \frac{\langle v, w \rangle}{\langle w, w \rangle} \langle w,v \rangle + \frac{\langle v, w \rangle^2}{\langle w, w \rangle^2}\langle w,w \rangle \geq 0 \ \ \ \ \ \ \ \ (A)$

Just rearrangine and simplifying:

$\displaystyle \langle v,v \rangle \langle w,w \rangle - \langle v, w \rangle^2 \geq 0$

This proves Cauchy-Schwarz inequality.

Now suppose we are interested in an expression for the defect in Cauchy-Schwarz i.e. the difference $\displaystyle \langle v,v \rangle \langle w,w \rangle - \langle v, w \rangle^2$. For this we can just consider the L.H.S of equation $(A)$ since it is just $\displaystyle \langle w,w \rangle \Big(\langle v,v \rangle \langle w,w \rangle - \langle v, w \rangle^2\Big)$.

i.e. Defect =

$\displaystyle \langle w,w \rangle \bigg(\langle v,v \rangle - 2 \frac{\langle v,w \rangle^2}{\langle w,w \rangle} + \frac{\langle v,w \rangle^2}{\langle w,w \rangle}\bigg)$

Which is just:

$\displaystyle \langle w,w \rangle\bigg(\Big\langle v - \frac{\langle w,v \rangle}{\langle w,w \rangle}w,v - \frac{\langle w,v \rangle}{\langle w,w \rangle}w \Big\rangle\bigg)$

This defect term is much in the spirit of the defect term that we saw in Lagrange’s identity above, and it is instructive to compare them.

________________

Proof 9: Proof using the AM-GM inequality

Let us first recall the AM-GM inequality:

For non-negative reals $x_1, x_2, \dots x_n$ we have the following basic inequality:

$\displaystyle \sqrt[n]{x_1 x_2 \dots x_n} \leq \Big(\frac{x_1 + x_2 + \dots x_n}{n}\Big)$.

Now let us define $\displaystyle A = \sqrt{a_1^2 + a_2^2 + \dots + a_n^2}$ and $\displaystyle B = \sqrt{b_1^2 + b_2^2 + \dots + b_n^2}$

Now consider the trivial bound (which gives us the AM-GM): $(x-y)^2 \geq 0$, which is just $\displaystyle \frac{1}{2}\Big(x^2 + y^2\Big) \geq xy$. Note that AM-GM as stated above for $n = 2$ is immediate when we consider $x \to \sqrt{x}$ and $y \to \sqrt{y}$

Using the above, we have:

$\displaystyle \frac{1}{2} \Big(\frac{a_i^2}{A^2} + \frac{b_i^2}{B^2}\Big) \geq \frac{a_ib_i}{AB}$

Summing over $n$, we have:

$\displaystyle \sum_{i=1}^n\frac{1}{2} \Big(\frac{a_i^2}{A^2} + \frac{b_i^2}{B^2}\Big) \geq \sum_{i=1}^n \frac{a_ib_i}{AB}$

But note that the L.H.S equals 1, therefore:

$\displaystyle \sum_{i=1}^n \frac{a_ib_i}{AB} \leq 1$ or $\displaystyle \sum_{i=1}^n a_ib_i \leq AB$

Writing out $A$ and $B$ as defined above, we have:

$\displaystyle \sum_{i=1}^n a_ib_i \leq \sqrt{\sum_{i=1}^na_i^2}\sqrt{\sum_{i=1}^nb_i^2}$.

Thus proving the Cauchy-Schwarz inequality.

________________

Proof 10: Using Jensen’s Inequality

We begin by recalling Jensen’s Inequality:

Suppose that $f: [p, q] \to \mathbb{R}$ is a convex function. Also suppose that there are non-negative numbers $p_1, p_2, \dots, p_n$ such that $\displaystyle \sum_{i=1}^{n} p_i = 1$. Then for all $x_i \in [p, q]$ for $i = 1, 2, \dots, n$ one has:

$\displaystyle f\Big(\sum_{i=1}^{n}p_ix_i\Big) \leq \sum_{i=1}^{n}p_if(x_i)$.

Now we know that $f(x) = x^2$ is convex. Applying Jensen’s Inequality, we have:

$\displaystyle \Big(\sum_{i=1}^{n} p_i x_i \Big)^2 \leq \sum_{i=1}^{n} p_i x_i^2$

Now, for $b_i \neq 0$ for all $i = 1, 2, \dots, n$, let $\displaystyle x_i = \frac{a_i}{b_i}$ and let $\displaystyle p_i = \frac{b_i^2}{\sum_{i=1}^{n}b_i^2}$.

Which gives:

$\displaystyle \Big(\sum_{i=1}^n \frac{a_ib_i}{\sum_{i=1}^{n}b_i^2}\Big)^2 \leq \Big(\sum_{i=1}^{n}\frac{a_i^2}{\sum_{i=1}^{n}b_i^2}\Big)$

Rearranging this just gives the familiar form of Cauchy-Schwarz at once:

$\displaystyle \Big(\sum_{i=1}^{n} a_ib_i\Big)^2 \leq \Big(\sum_{i=1}^{n} a_i^2\Big)\Big(\sum_{i=1}^{n} b_i^2\Big)$

________________

Proof 11: Pictorial Proof for d = 2

Here (page 4) is an attractive pictorial proof by means of tilings for the case $d = 2$ by Roger Nelson.

________________

## 1966 Film on John von Neumann

John von Neumann made so many fundamental contributions that Paul Halmos remarked that it was almost like von Neumann maintained a list of various subjects that he wanted to touch and develop and he systematically kept ticking items off. This sounds to be remarkably true if one just has a glance at the dizzyingly long “known for” column below his photograph on his wikipedia entry.

John von Neumann with one of his computers.

Since Neumann died (young) in 1957, rather unfortunately, there aren’t very many audio/video recordings of his (if I am correct just one 2 minute video recording exists in the public domain so far).

I recently came across a fantastic film on him that I would very highly recommend. Although it is old and the audio quality is not the best, it is certainly worth spending an hour on. The fact that this film features Eugene Wigner, Stanislaw UlamOskar Morgenstern, Paul Halmos (whose little presentation I really enjoyed), Herman Goldstein, Hans Bethe and Edward Teller (who I heard for the first time, spoke quite interestingly) alone makes it worthwhile.

Update: The following youtube links have been removed for breach of copyright. The producer of the film David Hoffman, tells us that the movie should be available as a DVD for purchase soon. Please check the comments to this post for more information.

Part 1

Find Part 2 here.

________________

Onionesque Reality Home >>

## Maximum Number of Regions in Arrangement of Hyperplanes

A semi-useful fact for people working in Machine Learning.

Blurb: Recently, my supervisor was kind enough to institute a small reading group on Deep Neural Networks to help us understand them better. A side product of this would hopefully be that I get to explore this old interest of mine (I have an old interest in Neural Nets and have always wanted to study them in detail, especially work from the mid 90s when the area matured and connections to many others areas such as statistical physics became apparent. But before I got to do that, they seem to be back in vogue! I dub this resurgence as Connectionism 2.0 Although the hipster in me dislikes hype, it is always lends a good excuse to explore an interest). I also attended a Summer School on the topic at UCLA, find slides and talks over here.

Coming back: we have been making our way through this Foundations and Trends volume by Yoshua Bengio (which I quite highly recommend). Bengio spends a lot of time giving intuitions and arguments on why and when Deep Architectures are useful. One particular argument (which is actually quite general and not specific to trees as it might appear) went like this:

Ensembles of trees (like boosted trees, and forests) are more powerful than a single tree. They add a third level to the architecture which allows the model to discriminate among a number of regions exponential in the number of parameters. As illustrated in […], they implicitly form a distributed representation […]

This is followed by an illustration with the following figure:

[Image Source: Learning Deep Architectures for AI]

and accompanying text to the above figure:

Whereas a single decision tree (here just a two-way partition) can discriminate among a number of regions linear in the number of parameters (leaves), an ensemble of trees can discriminate among a number of regions exponential in the number of trees, i.e., exponential in the total number of parameters (at least as long as the number of trees does not exceed the number of inputs, which is not quite the case here). Each distinguishable region is associated with one of the leaves of each tree (here there are three 2-way trees, each defining two regions, for a total of seven regions). This is equivalent to a multi-clustering, here three clusterings each associated with two regions.

Now, the following question was considered: Referring to the text in boldface above, is the number of regions obtained exponential in the general case? It is easy to see that there would be cases where it is not exponential. For example: the number of regions obtained by the three trees would be the same  as those obtained by one tree if all three trees overlap, hence giving no benefit. The above claim refers to a paper (also by Yoshua Bengio) where he constructs examples to show the number of regions could be exponential in some cases.

But suppose for our satisfaction we are interested in actual numbers and the following general question, an answer to which should also answer the question raised above:

What is the maximum possible number of regions that can be obtained in ${\bf R}^n$ by the intersection of $m$ hyperplanes?

Let’s consider some examples in the ${\bf R}^2$ case.

Where there is one hyperplace i.e. $m = 1$, then the maximum number of possible regions is obviously two.

[1 Hyperplane: 2 Regions]

Clearly, for two hyerplanes, the maximum number of possible regions is 4.

[2 Hyperplanes: 4 Regions]

In case there are three hyperplanes, the maximum number of regions that might be possible would be 7 as illustrated in the first figure. For $m =4$ i.e. 4 hyerplanes, this number is 11 as shown below:

[4 Hyperplanes: 11 Regions]

On inspection we can see the number of regions with $m$ hyperplanes in ${\bf R}^2$ is given by:

Number of Regions = #lines + #intersections + 1

Now let’s consider the general case: What is the expression that would give us the maximum number of regions possibles with $m$ hyperplanes in ${\bf R}^n$? Turns out that there is a definite expression for the same:

Let $\mathcal{A}$ be an arrangement of $m$ hyperplanes in ${\bf R}^n$, then the maximum number of regions possible is given by

$\displaystyle r(\mathcal{A}) = 1 + m + \dbinom{m}{2} \ldots + \dbinom{m}{n}$

the number of bounded regions (closed from all sides) is given by:

$\displaystyle b(\mathcal{A}) = \dbinom{m - 1}{n}$

The above expressions actually derive from a more general expression when $\mathcal{A}$ is specifically a real arrangement.

I am not familiar with some of the techniques that are required to prove this in the general case ($\mathcal{A}$ is a set of affine hyperplanes in a vector space defined over some Field).  The above might be proven by induction. However, it turns out that in the ${\bf R}^2$ case the result is related to the crossing lemma (see discussion over here), I think it was one of the most favourite results of one of my previous advisors and he often talked about it. This connection makes me want to study the details [1],[2], which I haven’t had the time to look at and I will post the proof for the general version in a future post.

________________

References:

1. An Introduction to Hyperplane Arrangements: Richard P. Stanley (PDF)

Related:

2. Partially Ordered Sets: William Trotter (Handbook of Combinatorics) (PDF)

________________

Onionesque Reality Home >>

## IPAM-UCLA Summer School on Deep Learning and Feature Learning

Deep Learning reads Wikipedia and discovers the meaning of life – Geoff Hinton.

The above quote is from a very interesting talk by Geoffrey Hinton I had the chance to attend recently.

I have been at a summer school on Deep Neural Nets and Unsupervised Featured Learning at the Institute for Pure and Applied Mathematics at UCLA since July 9 (till July 27). It has been organized by Geoff Hinton, Yoshua Bengio, Stan Osher, Andrew Ng and Yann LeCun.

I have always been a “fan” of Neural Nets and the recent spike in interest in them has made me excited, thus the school happened at just the right time. The objective of the summer school is to give a broad overview of some of the recent work in Deep Learning and Unsupervised Feature Learning with emphasis on optimization, deep architectures and sparse representations. I must add that after getting here and looking at the peer group I would consider myself lucky to have obtained funding for the event!

[Click on the above image to see slides for the talks. Videos will be added at this location after July 27 Videos are now available]

That aside, if you are interested in Deep Learning or Neural Networks in general, the slides for the talks are being uploaded over here (or click on the image above), videos will be added at the same location some time after the summer school ends so you might like to bookmark this link.

The school has been interesting given the wide range of people who are here. The diversity of opinions about Deep Learning itself has given a good perspective on the subject and the issues and strengths of it. There are quite a few here who are somewhat skeptical of deep learning but are curious, while there are some who have been actively working on the same for a while. Also, it has been enlightening to see completely divergent views between some of the speakers on key ideas such as sparsity. For example Geoff Hinton had a completely different view of why sparsity was useful in classification tasks than compared to Stéphane Mallat, who gave a very interesting talk today even joking that “Hinton and Yann LeCun told you why sparsity is useful, I’ll tell you why sparsity is useless. “. See the above link for more details.

Indeed, such opinions do tell you that there is a lot of fecund ground for research in these areas.

I have been compiling a reading list on some of this stuff and will make a blog-post on the same soon.

________________

Onionesque Reality Home >>

## Ramanujan: Letters from an Indian Clerk (1987)

I have never done anything useful. No discovery of mine has made or is likely to make, directly or indirectly, for good or for ill, the least difference to the amenity of the world. Judged by all practical standards, the value of my mathematical life is nil. And outside mathematics it is trivial anyhow. The case for my life then, or for anyone else who has been a mathematician in the same sense that I have been one is this: That I have added something to knowledge and helped others to add more, and that these somethings have a value that differ in degree only and not in kind from that of the creations of the great mathematicians or any of the other artists, great or small who’ve left some kind of memorial behind them.

I still say to myself when I am depressed and and find myself forced to listen to pompous and tiresome people “Well, I have done one thing you could never have done, and that is to have collaborated with Littlewood and Ramanujan on something like equal terms.” — G. H. Hardy (A Mathematician’s Apology)

Yesterday I  discovered an old (1987) British documentary on Srinivasa Ramanujan, which was pretty recently uploaded. I was not surprised to see that the video was made available by Christopher J. Sykes, who has been uploading older documentaries (including those by himself) on youtube (For example – The delightful “Richard Feynman and the Quest for Tannu Tuva” was uploaded by him as well. I blogged about it a couple of years ago!). Thanks Chris for these gems!

Since the documentary is pretty old, it is a little slow. But if you have one hour to spare, you should watch it! It features his (now late) widow, a quite young Béla Bollobás and the late Nobel Laureate Subrahmanyan Chandrasekhar. The video is embedded below – in case of any issues also find it linked here.

________________

[Ramanujan: Letters from an Indian Clerk]

________________

I could have written something on Ramanujan, but decided against it. Instead, I’d close this post with an excerpt from a wonderful essay by Freeman Dyson on Ramanujan published in Ramanujan: Essays and Surveys by Berndt and Rankin

Ramanujan: Essays and Surveys (click on image to view on Amazon)

A Walk Through Ramanujan’s Garden — F. J. Dyson

[…] The inequalities (8), (9) and (10) were undoubtedly true, but I had no idea how to prove them in 1942. In the end I just gave up trying to prove them and published them as conjectures in our student magazine “Eureka”. Since there was half a page left over at the end of my paper, they put in a poem by my friend Alison Falconer who was also a poet and mathematician. […]

Short Vision

Thought is the only way that leads to life.

All else is hollow spheres

Reflecting back

In heavy imitation

And blurred degeneration

A senseless image of our world of thought.

.

Man thinks he is the thought which gives him life.

He binds a sheaf and claims it as himself.

He is a ring through which we pass swinging ropes

Which merely move a little as he slips.

.

The Ropes are Thought.

The Space is Time.

Could he but see, then he might climb.

________________

Onionesque Reality Home >>

## The Opinions of Doron Zeilberger

I presume that a lot of people who drop by this blog are familiar with Doron Zeilberger‘s opinions already. Even though a lot of people who know me personally get linked frequently to some or the other opinions of Zeilberger, I thought it would be a good idea to blog about them in any case, for I believe more people should know about them, even if the number is not high enough.

For a one line introduction, Doron Zeilberger is a noted Israeli mathematician who is presently a professor at Rutgers. He maintains a “blog” which has the title “Dr. Z’s Opinions” in which there are an assortment of views on topics broadly related to Mathematics. Zeilberger certainly has a flair for writing and oftentimes makes hard-hitting points which might outrage many (his latest writing on Turing for example is sure to make many people shake their heads in disagreement – me included) which usually could be seen as chipping away at commonly held opinions. All the interestingness about his opinions aside, his sense of humour makes them entertaining in any case. Even if one disagrees with them I would highly recommend them as long as one exercises some discretion in sifting through these Indiscrete Thoughts.

I found his opinions many years ago while searching for something witty about weekly colloquiums which I could send to some of my colleagues who somehow took pride in not going for them. Skipping colloquiums is a habit that I have not understood well. He wrote the following about it (Opinion 20):

Socrates said that one should always marry. If your spouse would turn out to be nice, then you’ll be a happy person. If your spouse would turn out to be a bitch/bastard, then you’ll become a philosopher.

The same thing can be said about the weekly colloquium. If the speaker is good, you’ll learn something new and interesting, usually outside your field. If the speaker is bad, you’ll feel that you have accomplished something painful, like fasting, or running a marathon, so while you may suffer during the talk, you’ll feel much better after it.

What prompted me to blog about his “blog” was a recent opinion of his. Some months ago when Endre Szemeredi won the Abel Prize, I got very excited, almost like a school boy and the next morning I went to the college library to see what the national dailies had to say about the achievement. To my surprise and dismay none of the dailies seem to have noticed it at all! Three or four days after that the New York Times carried a full page advertisement by Rutgers University having a great photo of Szemeredi, however that doesn’t count as news. I was delighted to see that Doron Zeilberger noticed this too and wrote about it (see his 122nd Opinion)

Let me conclude by wishing Endre, “the computer science professor who never touched a computer”, many more beautiful and deep theorems, and console him (and us) that in a hundred years, and definitely in a thousand years, he would be remembered much more than any contemporary sports or movie star, and probably more than any living Nobel prize winner.

One of my all time favourite opinions of his is Opinion 62, which compares the opposing styles of genius of two men I have had the highest respect for – Israel Gelfand and Alexander Grothendieck. I often send it to people who I think are highly scientifically talented but somehow waste time in expending energy in useless causes than trying to do science (especially if one doesn’t have an intellect comparable to some fraction of Grothendieck’s)! I take the liberty of reproducing the entire opinion here –

I just finished reading Allyn Jackson’s fascinating two-part article about the great mathematical genius Alexandre Grothendieck (that appeared in the Notices of the Amer. Math. Soc.) , and Pierre Cartier’s extremely moving and deep essay `Une pays dont on ne conaitrait que le nom: Le “motifs” de Grothendieck’. (that appeared in the very interesting collection “Le Reel en mathematiques”, edited by P. Cartier and Nathalie Charraud, and that represents the proceedings of a conference about psychoanalysis and math).

In Pierre Cartier’s article, in addition to an attempt at a penetrating “psychoanalysis” he also gives a very lucid non-technical summary of Grothendieck’s mathematical contributions. From this it is clear that one of the greatest giants on whose shoulders Grothendieck stood was Israel Gelfand, whom I am very fortunate to know personally (I am one of the few (too few!) regulars that attend his weekly seminar at Rutgers). I couldn’t help notice the great contrast between these two Giants, and their opposing styles of Genius.

Myself, I am not even an amateaur psychoanalyst, but motives and psi aside, I can easily explain why Grothendieck stopped doing math a long time ago (hence, died, according to Erdos’s nomenclature), while Gelfand, at age 91, is as active and creative as ever.

First and foremost, Grothendieck is a dogmatic purist (like many of the Bourbakists). He dislikes any influences from outside mathematics, or even from other subareas of math. In particular, he always abhored mathematical physics. Ironically, as Cartier explains so well, many major applications of his ground-breaking work were achieved by interfacing it with mathematical physics, in the hands of the “Russian” school, all of whom were disciples of Gelfand. As for Combinatorics, forget it! And don’t even mention the computer, it is du diable. As for Gelfand, he was always sympathetic to all science, even biology! In fact he is also considered a prominent theoretical biologist. Gelfand also realizes the importance of combinatorics and computers.

Also people. Grothendieck was a loner, and hardly collaborated. On the other hand, Gelfand always (at least in the last sixty years) works with other people. Gelfand is also very interested in pedagogy, and in establishing math as an adequate language.

Grothendieck spent a lot of energy in rebellious political causes, probably since in his youth he was an obedient bon eleve. On the other hand, Gelfand was already kicked out of high-school (for political reasons), so could focus all his rebellious energy on innovative math.

So even if you are not quite as smart or original as Gelfand and Grothendieck (and who is?), you will still be able to do math well into your nineties, if you follow Gelfand’s, rather than Grothendieck’s, example.

Zeilberger also seems to have a lot of respect for G. J. Chaitin, something that I certainly find very interesting. I mention this because I have been reading and re-reading Chaitin these days, especially after discovering some of his very recent work on meta-biology.

PS: Zeilgerber was featured in a BBC Documentary on infinity (not a great one, though) in which he talked about his ultrafinitist viewpoint.

________________

Onionesque Reality Home >>

## Endre Szemerédi wins the Abel Prize

In absolutely big news, the Norwegian Academy of Science and Letters has made a fantastic decision by awarding the 2012 Abel Prize to Prof. Endre Szemeredi, one of the greatest mathematicians of our time. We must remember that such decisions are made by committees, and hence I would congratulate the Abel Committee (comprising of Ragni Piene, Terence Tao, Dave Donoho, M. S. Raghunathan and Noga Alon) for such an excellent decision !

Some months ago I told one of my über-cool-dude supervisors (Gabor) that Endre would win the Abel prize this year (guessing was no rocket science!)! I usually don’t like making such statements as there are always many great mathematicians who could win at any given time and there are a lot of other factors too. But Gabor actually told this to Endre, who ofcourse didn’t think it was serious. But apparently he did win it this year! A very well deserved award!

It’s pointless to make an attempt to talk about (not that I am competent to do so anyway) some of Prof. Szemeredi’s deep results and the resulting fundamental contributions to mathematics. Timothy Gowers wrote a good article on the same for the non-mathematical audience. Especially see a mention of Machine Learning on page 7. However, other than the Regularity Lemma that I find absolutely beautiful, my other favorite result of Szemeredi is his Crossing Lemma. A brief discussion on the Regularity Lemma in an older blog post can be found here.

An Irregular Mind: Szemeredi is 70. Book from Szemeredi’s 70th birthday conference recently.

For a short background Prof. Szemeredi was born in Budapest and initially studied at Eötvös before getting his PhD from Moscow State University, where he was advised by the legendary Soviet mathematician Israel Gelfand. He presently holds a position both at the Alfréd Rényi Institute of Mathematics and Rutgers and has had held visiting positions at numerous other places. Recently on his 70th birthday The János Bolyai Mathematical Society organized a conference in his honour, the proceedings of which were published as an appropriately titled book – “An Irregular Mind” (obviously a play on his “Regularity Lemma” and related work and as stated in the book “Szemerédi has an ‘irregular mind’; his brain is wired differently than for most mathematicians. Many of us admire his unique way of thinking, his extraordinary vision.”).

Congratulations to Endre Szemeredi and the great, absolutely unique Hungarian way of doing mathematics.

_________________________

See Also: Short Course on Additive Combinatorics focused on the Regularity Lemma and Szemeredi’s Theorem, Princeton University. (h/t Ayan Acharya)

Onionesque Reality Home >>

## Abstractions

I thought I understood Spectral Clustering well enough till I came across these two paragraphs:

Graph Laplacians are interesting linear operators attached to graphs. They are the discrete analogues of the Laplace-Beltrami operators that appear in the Hodge theory of Riemannian manifolds, whose null spaces provide particularly nice representative forms for de Rham cohomology. In particular, their eigenfunctions produce functions on the vertex set of the graph. They can be used, for example, to produce cluster decompositions of data sets when the graph is the 1-skeleton of a Vietoris-Rips complex. We ﬁnd that these eigenfunctions (again applied to the 1-skeleton of the Vietoris-Rips complex of a point cloud) also can produce useful ﬁlters in the Mapper analysis of data sets

– From Prof. Gunnar Carlsson’s survey Topology and Data. (More on this survey as a manifesto for “Topology and Data” in a later post). That aside, I do like how the image on the wiki entry for Vietoris-Rips complex looks like:

A little less intimidating ( now this almost borders on “ofcourse that’s how it is”. I am interested in the same reaction for the paragraph above some months later):

A related application [of the graph laplacian] is “Spectral Clustering”, which is based on the observation that nodal domains of the first eigenvectors of the graph laplacian can be used as indicators for suitably size-balanced minimum cuts.

– From Laplacian Eigenvectors of Graphs linked in the previous post. While this isn’t really as compressed as the lines above, they made me think since I did not know about Courant’s Nodal domain theorem. Like I did in the previous blog post, I would highly recommend this (about 120 page) book. It soon covers the Nodal Domain theorem and things make sense (even in context of links between PCA and k-means and Non-Negative Matrix Factorization and Spectral Clustering, at least in an abstract sense).

_________________________

Onionesque Reality Home >>