Feeds:
Posts

## A Note on the Graph Laplacian

A short note on how the Graph Laplacian is a natural analogue of the Laplacian in vector analysis.

For a twice differentiable function $f$ on the euclidean space, the Laplacian of $f$, $\Delta f$ is the div(grad( $f$)) or the divergence of the gradient of $f$. Let’s consider the laplacian in two dimensions: $\displaystyle \Delta f(x,y) = \nabla \cdot \nabla f(x,y) = \frac{d^2 f(x,y)}{dx^2} + \frac{d^2 f(x,y)}{dy^2}$

To get an intuition about what it does, it is useful to consider a finite difference approximation to the above: $\displaystyle \Delta f(x,y) \approx \frac{f(x+h,y) + f(x-h,y) + f(x,y+h) + f(x,y-h) - 4f(x,y)}{h^2}$

The above makes it clear that the Laplacian behaves like a local averaging operator. Infact the above finite difference approximation is used in image processing (say, when h is set to one pixel) to detect sharp changes such as edges, as the above is close to zero in smooth image regions.

____________________________

The finite difference approximation intuitively suggests that a discrete laplace operator should be natural. At the same time, the Graph Theory literature defines the Graph Laplacian as the matrix $L = D - A$ (where $D$ is the degree matrix and $A$ the adjacency matrix of the graph $G$, both are defined below). It is not obvious that this matrix $L$ follows from the div(grad(f)) definition mentioned above. The purpose of this note is to explicate in short on how this is natural, once the notion of gradient and divergence operators have been defined over a graph.

First of all the two most natural matrices that can be associated with a graph are defined below:

____________________________

There are two natural matrices that are used to specify a simple graph $G = (V,E)$ with vertex set $V =\{1, \dots, n \}$ and edge set $E = \{1, \dots, m \}$: The Adjacency matrix $A$ and the Incidence matrix $\nabla$.

The Adjacency matrix is defined as a $n \times n$ matrix such that: $\displaystyle A_{u,v} = \begin{cases} 1 & \mbox{if } uv \in E \\ 0 & \mbox{otherwise} \end{cases}$

To define the $m \times n$ incidence matrix $\nabla$, we need to fix an orientation of the edges (for our purpose an arbitrary orientation suffices), then: $\displaystyle \nabla_{e,v} = \begin{cases} -1 & \text{if } v \text{ is initial vertex of edge } e \\ +1 & \text{if } v \text{ is terminal vertex of edge } e \\ 0 & \text{if } e \text{ is not incident on vertex } v \end{cases}$

Now suppose we have a real valued function $f$ over $V$, $f: V \to \mathbb{R}$. For $G$, we view this as a column vector indexed by the vertices.

Similarly let $g$ be a real valued function over the edges $E$, $g: E \to \mathbb{R}$ (again, we view this as a row vector indexed by the edges).

Now, consider the following map: $f \mapsto \nabla f$ (which is now a vector indexed by $E$). The value of this map at any edge $(\nabla f) (e)$ is simply the difference of the values of $f$ at the two end points of the edge $e$ i.e. $f_u - f_v$. This makes it somewhat intuitive that the matrix $\nabla$ is some sort of a difference operator on $G$. Given this “difference operator” or “discrete differential” view, $(\nabla f)$ can then be viewed as the gradient that measures the change of the function $f$ along the edges of $G$. Thus the map $f \mapsto \nabla f$ is just the gradient operator.

Like the above, now we consider the following map: $g \mapsto \nabla^{T} g^T$ (which is now a vector indexed by $V$). Recall that $g$ was defined on the edges. The value of this map at any vertex $(g \nabla)(v)$ is $\displaystyle \sum_{e \text{ exits } v} g_e - \sum_{e \text{ enters } v} g_e$. If we were to view $g$ as describing some kind of a flow on the edges, then $(g \nabla)(v)$ is the net outbound flow on the vertex $v$, which is just the divergence. Thus the map $f \mapsto g \nabla$ is just the divergence operator.

Now consider $f \mapsto \nabla^{T} \nabla f$ (note that this makes sense since $\nabla f$ is a column vector), going by the above, this should be the divergence of the gradient. Thus, the analogy with the “real” Laplacian makes sense and the matrix $L_G = \nabla^{T} \nabla$ is appropriately called the Graph Laplacian.

Using the definition of $\nabla$, a simple computation yields the following for $L_G$: $\displaystyle L_{G{u,v}} = \begin{cases} -1 & \mbox{if } uv \in E \\ deg(u) & \mbox{if } u =v \end{cases}$

Looking at the above and recalling the definition for the adjacency matrix $A$, it is easy to see that $L_G = D - A$, where $D$ is the diagonal matrix with $D_{u,u} = deg(u)$. This is just the familiar definition that I mentioned at the start.

PS: From the above it is also clear that the Laplacian is positive semidefinite. Indeed, consider $f ^TL f$, which is written as $f^T \nabla^T \nabla f =\|\nabla f\|^2 = \sum_{(u,v) \in E} (f_u - f_v)^2$

## Abstractions

I thought I understood Spectral Clustering well enough till I came across these two paragraphs:

Graph Laplacians are interesting linear operators attached to graphs. They are the discrete analogues of the Laplace-Beltrami operators that appear in the Hodge theory of Riemannian manifolds, whose null spaces provide particularly nice representative forms for de Rham cohomology. In particular, their eigenfunctions produce functions on the vertex set of the graph. They can be used, for example, to produce cluster decompositions of data sets when the graph is the 1-skeleton of a Vietoris-Rips complex. We ﬁnd that these eigenfunctions (again applied to the 1-skeleton of the Vietoris-Rips complex of a point cloud) also can produce useful ﬁlters in the Mapper analysis of data sets

– From Prof. Gunnar Carlsson’s survey Topology and Data. (More on this survey as a manifesto for “Topology and Data” in a later post). That aside, I do like how the image on the wiki entry for Vietoris-Rips complex looks like: A little less intimidating ( now this almost borders on “ofcourse that’s how it is”. I am interested in the same reaction for the paragraph above some months later):

A related application [of the graph laplacian] is “Spectral Clustering”, which is based on the observation that nodal domains of the first eigenvectors of the graph laplacian can be used as indicators for suitably size-balanced minimum cuts.

– From Laplacian Eigenvectors of Graphs linked in the previous post. While this isn’t really as compressed as the lines above, they made me think since I did not know about Courant’s Nodal domain theorem. Like I did in the previous blog post, I would highly recommend this (about 120 page) book. It soon covers the Nodal Domain theorem and things make sense (even in context of links between PCA and k-means and Non-Negative Matrix Factorization and Spectral Clustering, at least in an abstract sense).

_________________________

Onionesque Reality Home >>

## Two Interesting Short Volumes on the (Graph) Laplacian

Perhaps the most fundamental differential operator on Euclidean space $R^d$ is the Laplacian – Terence Tao.

One can only agree with Prof. Tao. This agreement only intensifies when one considers the generalizations of the Laplacian such as the Laplace-Beltrami operators that appear in the Hodge theory of Riemannian Manifolds. Or when one considers the discrete analogues of the Laplace Beltrami operator such as the Graph Laplacian which I dare say have changed the landscape of research in unsupervised machine learning in the past decade or so. For a sample consider (the related) laplacian eigenmaps, spectral clustering and diffusion maps for just three examples. I have had the chance to work on Manifold learning for a while and have been very fascinated by the Graph Laplacian and its uncanny prowess. I was thus looking for material that actually relates and talks about the laplacian from the point of view of “flow”, diffusion and the heat equation beyond a superficial sense. The idea of diffusion or flow is a very interesting way of looking at distance, which is also partly the reason why the said machine learning techniques are so successful.  This paper (Semi-Supervised Learning on Riemannian Manifolds) by Belkin and Niyogi is quite beautiful from a machine learning point of view, but I was recently pointed out to this short volume by my supervisor (with the gentle warning that it might take a lot of work): The Laplacian on a Riemannian Manifold – S. Rosenberg

Click on Image to view on Amazon

Thus, the title of the post is a little misleading, as this monograph has almost nothing to do (directly) with graph laplacians. But then I am only interested in this from the point of approach wherein I can get a better understanding of why they are so powerful. This seems like a slow read but is not inaccessible and is very well written. But the best thing about this book is that it is pointed. Some sections in between can be skipped without any problem too. In the worst case it could be considered a roadmap to what needs to be known to really understand the power of the graph laplacian.

Another book that I have been reading slowly these days (it’s actually almost like a long paper than a book and thinking of it that way has a big psychological impact), and quite enjoying is this one: Laplacian Eigenvectors of Graphs

Click on Image to view on Amazon

Unlike the book above, this is far more accessible and I believe would be to somebody who has an interest in the graph laplacian or even spectral clustering. It mostly deals with some interesting issues related to the graph laplacian that I had never even heard of. While these books are not new, I just discovered them a month back and I think they should fascinate anybody who is fascinated with the Laplacian.

_________________________

Onionesque Reality Home >>