Archive for September, 2013

Over the past 4-5 months whenever there is some time to spare, I have been working through The Cauchy-Schwarz Master Class by J. Michael Steele. And, although I am still left with the last two chapters, I have been reflecting on the material already covered in order to get a better perspective on what I have been slowly learning over the months. This blog post is a small exercise in this direction.

Ofcourse, there is nothing mysterious about proving the Cauchy-Schwarz inequality; it is fairly obvious and basic. But I thought it still might be instructive (mostly to myself) to reproduce some proofs that I know out of memory (following a maxim of my supervisor on a white/blackboard blogpost). Although, why Cauchy-Schwarz keeps appearing all the time and what makes it so useful and fundamental is indeed quite interesting and non-obvious. And like Gil Kalai notes, it is also unclear why is it that it is Cauchy-Schwarz which is mainly useful. I feel that Steele’s book has made me appreciate this importance somewhat more (compared to 4-5 months ago) by drawing to many concepts that link back to Cauchy-Schwarz.

Before getting to the post, a word on the book: This book is perhaps amongst the best mathematics book that I have seen in many years. True to its name, it is indeed a Master Class and also truly addictive. I could not put aside the book completely once I had picked it up and eventually decided to go all the way. Like most great books, the way it is organized makes it “very natural” to rediscover many susbtantial results (some of them named) appearing much later by yourself, provided you happen to just ask the right questions. The emphasis on problem solving makes sure you make very good friends with some of the most interesting inequalities. The number of inequalities featured is also extensive. It starts off with the inequalities dealing with “natural” notions such as monotonicity and positivity and later moves onto somewhat less natural notions such as convexity. I can’t recommend this book enough!

Now getting to the proofs: Some of these proofs appear in Steele’s book, mostly as either challenge problems or as exercises. All of them were solvable after some hints.


Proof 1: A Self-Generalizing proof

This proof is essentially due to Titu Andreescu and Bogdan Enescu and has now grown to be my favourite Cauchy-Schwarz proof.

We start with the trivial identity (for a, b \in \mathbb{R}, x >0, y>0 ):

Identity 1: (ay - bx)^2 \geq 0

Expanding we have

a^2y^2 + b^2x^2 - 2abxy \geq 0

Rearranging this we get:

\displaystyle \frac{a^2y}{x} + \frac{b^2x}{y} \geq 2ab

Further: \displaystyle a^2 + b^2 + \frac{a^2y}{x} + \frac{b^2x}{y} \geq (a+b)^2;

Rearranging this we get the following trivial Lemma:

Lemma 1: \displaystyle \frac{(a+b)^2}{(x+y)} \leq \frac{a^2}{x} + \frac{b^2}{y}

Notice that this Lemma is self generalizing in the following sense. Suppose we replace b with b + c and y with y + z, then we have:

\displaystyle \frac{(a+b+c)^2}{(x+y+z)} \leq \frac{a^2}{x} + \frac{(b+c)^2}{y+z}

But we can apply Lemma 1 to the second term of the right hand side one more time. So we would get the following inequality:

\displaystyle \frac{(a+b+c)^2}{(x+y+z)} \leq \frac{a^2}{x} + \frac{b^2}{y} + \frac{c^2}{z}

Using the same principle n times we get the following:

\displaystyle \frac{(a_1+a_2+ \dots a_n)^2}{(x_1+x_2+ \dots + x_n)} \leq \frac{a_1^2}{x_1} + \frac{a_2^2}{x_2} + \dots + \frac{a_n^2}{x_n}

Now substitute a_i = \alpha_i \beta_i and x_i = \beta_i^2 to get:

\displaystyle ( \sum_{i=1}^n \alpha_i \beta_i )^2 \leq \sum_{i=1}^n ( \alpha_i )^2 \sum_{i=1}^n ( \beta_i )^2

This is just the Cauchy-Schwarz Inequality, thus completing the proof.


Proof 2: By Induction

Again, the Cauchy-Schwarz Inequality is the following: for a, b \in \mathbb{R}

\displaystyle ( \sum_{i=1}^n a_i b_i )^2 \leq \sum_{i=1}^n ( a_i )^2 \sum_{i=1}^n ( b_i )^2

For proof of the inequality by induction, the most important thing is starting with the right base case. Clearly n = 1 is trivially true, suggesting that it is perhaps not of much import. So we consider the case for n = 2. Which is:

\displaystyle ( a_1 b_1 + a_2 b_2 )^2 \leq ( a_1^2 + a_2^2 ) (b_1^2 + b_2^2)

To prove the base case, we simply expand the expressions. To get:

\displaystyle a_1^2 b_1^2 + a_2^2 b_2^2 + 2 a_1 b_1 a_2 b_2 \leq a_1^2 b_1^2 + a_1^2 b_2^2 + a_2^2 b_1^2 + a_2^2 b_2^2

Which is just:

\displaystyle a_1^2 b_2^2 + a_2^2 b_1^2 - 2 a_1 b_1 a_2 b_2 \geq 0


\displaystyle (a_1 b_2 - a_2 b_1 )^2 \geq 0

Which proves the base case.

Moving ahead, we assume the following inequality to be true:

\displaystyle ( \sum_{i=1}^k a_i b_i )^2 \leq \sum_{i=1}^k ( a_i )^2 \sum_{i=1}^k ( b_i )^2

To establish Cauchy-Schwarz, we have to demonstrate, assuming the above, that

\displaystyle ( \sum_{i=1}^{k+1} a_i b_i )^2 \leq \sum_{i=1}^{k+1} ( a_i )^2 \sum_{i=1}^{k+1} ( b_i )^2

So, we start from H(k):

\displaystyle ( \sum_{i=1}^k a_i b_i )^2 \leq \sum_{i=1}^k ( a_i )^2 \sum_{i=1}^k ( b_i )^2

we further have,

\displaystyle \Big(\sum_{i=1}^k a_i b_i\Big) + a_{k+1}b_{k+1} \leq \Big(\sum_{i=1}^k ( a_i )^2\Big)^{1/2} \Big(\sum_{i=1}^k ( b_i )^2\Big)^{1/2} + a_{k+1}b_{k+1} \ \ \ \ (1)

Now, we can apply the case for n = 2. Recall that: \displaystyle a_1 b_1 + a_2 b_2 \leq (a_1^2 + a_2^2)^{1/2} (b_1^2 + b_2^2)^{1/2}

Thus, using this in the R. H. S of (1) , we would now have:

\displaystyle \Big(\sum_{i=1}^{k+1} a_i b_i\Big) \leq \Big(\sum_{i=1}^k ( a_i )^2 + a_{k+1}^2\Big)^{1/2} \Big(\sum_{i=1}^k ( b_i )^2 + b_{k+1}^2\Big)^{1/2}


\displaystyle \Big(\sum_{i=1}^{k+1} a_i b_i\Big) \leq \Big(\sum_{i=1}^{k+1} ( a_i )^2 \Big)^{1/2} \Big(\sum_{i=1}^{k+1} ( b_i )^2 \Big)^{1/2}

This proves the case H(k+1) on assuming H(k). Thus also proving the Cauchy-Schwarz inequality by the principle of mathematical induction.


Proof 3: For Infinite Sequences using the Normalization Trick:

We start with the following question.

Problem: For a, b \in \mathbb{R} If \displaystyle \Big(\sum_{i=1}^{\infty} a_i^2 \Big) < \infty and \displaystyle \Big(\sum_{i=1}^{\infty} b_i^2 \Big) < \infty then is \displaystyle \Big(\sum_{i=1}^{\infty} |a_i||b_i|\Big) < \infty ?

Note that this is easy to establish. We simply start with the trivial identity (x-y)^2 \geq 0 which in turn gives us \displaystyle xy \leq \frac{x^2}{2} + \frac{y^2}{2}

Next, take x = |a_i| and y = |b_i| on summing up to infinity on both sides, we get the following:

\displaystyle \Big( \sum_{i=1}^{\infty} |a_i||b_i| \Big)^2 \leq \frac{1}{2}\Big(\sum_{i=1}^{\infty} a_i^2\Big) + \frac{1}{2} \Big(\sum_{i=1}^{\infty} b_i^2\Big)\ \ \ \ \ \ \ \ (2)

From this it immediately follows that

\displaystyle \Big(\sum_{i=1}^{\infty} |a_i||b_i|\Big) < \infty

Now let

\displaystyle \hat{a}_i = \frac{a_i}{\Big(\sum_j a_i^2\Big)^{1/2}} and

\displaystyle \hat{b}_i = \frac{b_i}{\Big(\sum_j b_i^2\Big)^{1/2}}; substituting in (2), we get:

\displaystyle \Big( \sum_{i=1}^{\infty} |\hat{a}_i||\hat{b}_i| \Big)^2 \leq \frac{1}{2}\Big(\sum_{i=1}^{\infty} \hat{a}_i^2\Big) + \frac{1}{2} \Big(\sum_{i=1}^{\infty} \hat{b}_i^2\Big) or,

\displaystyle \Bigg(\sum_{i = 1}^{\infty}\frac{a_i}{\Big(\sum_j a_j^2\Big)^{1/2}}\frac{b_i}{\Big(\sum_j b_j^2\Big)^{1/2}}\Bigg)^2 \leq \frac{1}{2} + \frac{1}{2}

Which simply gives back Cauchy’s inequality for infinite sequences thus completing the proof:

\displaystyle \Big(\sum_{i=1}^{\infty} a_i b_i\Big)^2 \leq \Big(\sum_{i=1}^{\infty}a_i^2\Big) \Big(\sum_{i=1}^{\infty}b_i^2\Big)


Proof 4: Using Lagrange’s Identity

We first start with a polynomial which we denote by \mathbf{Q}_n:

\mathbf{Q}_n = \big(a_1^2 + a_2^2 + \dots + a_n^2 \big) \big(b_1^2 + b_2^2 + \dots + b_n^2 \big) - \big(a_1b_1 + a_2b_2 + \dots + a_nb_n\big)^2

The question to now ask, is \mathbf{Q}_n \geq 0? To answer this question, we start of by re-writing \mathbf{Q}_n in a “better” form.

\displaystyle \mathbf{Q}_n = \sum_{i=1}^{n}\sum_{j=1}^{n} a_i^2 b_j^2 - \sum_{i=1}^{n}\sum_{j=1}^{n} a_ib_i a_jb_j

Next, as J. Michael Steele puts, we pursue symmetry and rewrite the above so as to make it apparent.

\displaystyle \mathbf{Q}_n = \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_i^2 b_j^2 + a_j^2 b_i^2\big) - \frac{2}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} a_ib_i a_jb_j

Thus, we now have:

\displaystyle \mathbf{Q}_n = \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_i b_j - a_j b_i\big)^2

This makes it clear that \mathbf{Q}_n can be written as a sum of squares and hence is always postive. Let us write out the above completely:

\displaystyle \sum_{i=1}^{n}\sum_{j=1}^{n} a_i^2 b_j^2 - \sum_{i=1}^{n}\sum_{j=1}^{n} a_ib_ib_jb_j = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_ib_j -a_jb_i\big)^2

Now, reversing the step we took at the onset to write the L.H.S better, we simply have:

\displaystyle \sum_{i}^{n} a_i^2 \sum_{}^{n} b_i^2 - \big(\sum_{i=1}^n a_ib_i\big)^2 = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n} \big(a_ib_j -a_jb_i\big)^2

This is called Lagrange’s Identity. Now since the R.H.S. is always greater than or equal to zero. We get the following inequality as a corrollary:

\displaystyle \big(\sum_{i=1}^n a_ib_i\big)^2 \leq \sum_{i}^{n} a_i^2 \sum_{}^{n} b_i^2

This is just the Cauchy-Schwarz inequality, completing the proof.


Proof 5: Gram-Schmidt Process gives an automatic proof of Cauchy-Schwarz

First we quickly review the Gram-Schmidt Process: Given a set of linearly independent elements of a real or complex inner product space \big(V,\langle\cdot, \cdot\rangle\big), \mathbf{x_1}, \mathbf{x_2}, \dots, \mathbf{x_n}. We can get an orthonormal set of n elemets \mathbf{e_1}, \mathbf{e_2}, \dots, \mathbf{e_n} by the simple recursion (after setting \displaystyle \mathbf{e_1 = \frac{\mathbf{x_1}}{\|x_1 \|}}).

\displaystyle \mathbf{z_k} = \mathbf{x_k} - \sum_{j=1}^{k-1} \langle\mathbf{x_k},\mathbf{e_j}\rangle\mathbf{e_j} and then

\displaystyle \mathbf{e_k} = \frac{\mathbf{z_k}}{\|\mathbf{z_k}\|}

for k = 2, 3, \dots, n.

Keeping the above in mind, assume that \| x\| = 1. Now let x = e_1. Thus, we have:

\mathbf{z} = \mathbf{y} - \langle \mathbf{y}, \mathbf{e_1}\rangle\mathbf{e_1}

Giving: \displaystyle \mathbf{e_2} = \frac{\mathbf{z}}{\|\mathbf{z}\|}. Rearranging we have:

\displaystyle \|\mathbf{z}\|\mathbf{e_2} = \mathbf{y} - \langle \mathbf{y}, \mathbf{e_1}\rangle\mathbf{e_1} or

\displaystyle \mathbf{y} = \langle \mathbf{y},\mathbf{e_1}\rangle \mathbf{e_1} + \|\mathbf{z}\|\mathbf{e_2} or

\displaystyle \mathbf{y} = \mathbf{c_1} \mathbf{e_1} + \mathbf{c_2}\mathbf{e_2} where c_1, c_2 are constants.

Now note that: \displaystyle \langle\mathbf{x},\mathbf{y}\rangle = c_1 and

\displaystyle \langle \mathbf{y},\mathbf{y}\rangle = |c_1|^2 + |c_2|^2. The following bound is trivial:

\displaystyle |c_1| \leq (|c_1|^2 + |c_2|^2)^{1/2}. But note that this is simply \langle x,y \rangle \leq \langle y,y \rangle^{1/2}

Which is just the Cauchy-Schwarz inequality when \|x\| = 1.


Proof 6: Proof of the Continuous version for d =2; Schwarz’s Proof

For this case, the inequality may be stated as:

Suppose we have S \subset \mathbb{R}^2 and that f: S \to \mathbb{R} and g: S \to \mathbb{R}. Then consider the double integrals:

\displaystyle A = \iint_S f^2 dx dy, \displaystyle B = \iint_S fg dx dy and \displaystyle C = \iint_S g^2 dx dy. These double integrals must satisfy the following inequality:

|B| \leq \sqrt{A} . \sqrt{C}.

The proof given by Schwarz as is reported in Steele’s book (and indeed in standard textbooks) is based on the following observation:

The real polynomial below is always non-negative:

\displaystyle p(t) = \iint_S \Big( t f(x,y) + g(x,y) \Big)^2 dx dy = At^2 + 2Bt + C

p(t) > 0 unless f and g are proportional. Thus from the binomial formula we have that B^2 \leq AC, moreover the inequality is strict unless f and g are proportional.


Proof 7: Proof using the Projection formula

Problem: Consider any point x \neq 0 in \mathbb{R}^d. Now consider the line that passes through this point and origin. Let us call this line \mathcal{L} = \{ tx: t \in \mathbb{R}\}. Find the point on the line closest to any point v \in \mathbb{R}^d.

If P(v) is the point on the line that is closest to v, then it is given by the projection formula: \displaystyle P(v) = x \frac{\langle x, v \rangle }{\langle x, x \rangle}

This is fairly elementary to establish. To find the value of t, such that distance \rho(v,tx) is minimized, we can simply consider the squared distance \rho^2(v,tx) since it is easier to work with. Which by definition is:

\displaystyle \rho^2(v,tx) = \langle v - x, v - tx \rangle

which is simply:

\displaystyle \rho^2(v,tx) = \langle v, v \rangle - 2t \langle v, x \rangle + t^2 \langle x, x \rangle

\displaystyle = \langle x, x \rangle \bigg( t^2 -2t \frac{\langle v, x \rangle}{\langle x, x \rangle} + \frac{\langle v, v \rangle}{\langle x, x \rangle}\bigg)

\displaystyle = \langle x, x \rangle \bigg\{ \bigg(t - \frac{\langle v,x \rangle}{\langle x,x \rangle}\bigg)^2 - \frac{\langle v,x \rangle^2}{\langle x,x \rangle^2}\bigg\} + \frac{\langle v, v \rangle}{\langle x, x \rangle}\bigg)

\displaystyle = \langle x, x \rangle \bigg\{ \bigg(t - \frac{\langle v,x \rangle}{\langle x,x \rangle}\bigg)^2 - \frac{\langle v,x \rangle^2}{\langle x,x \rangle^2} + \frac{\langle v, v \rangle}{\langle x, x \rangle} \bigg\}

\displaystyle = \langle x, x \rangle \bigg\{ \bigg(t - \frac{\langle v,x \rangle}{\langle x,x \rangle}\bigg)^2 - \frac{\langle v,x \rangle^2}{\langle x,x \rangle^2} + \frac{\langle v, v \rangle \langle x, x \rangle}{\langle x, x \rangle^2} \bigg\}

So, the value of t for which the above is minimized is \displaystyle \frac{\langle v,x \rangle}{\langle x,x \rangle}. Note that this simply reproduces the projection formula.

Therefore, the minimum squared distance is given by the expression below:

\displaystyle \min_{t \in \mathbb{R}} \rho^2(v, tx) = \frac{\langle v,v\rangle \langle x,x \rangle - \langle v,x \rangle^2}{\langle x,x \rangle}

Note that the L. H. S is always positive. Therefore we have:

\displaystyle \frac{\langle v,v\rangle \langle x,x \rangle - \langle v,x \rangle^2}{\langle x,x \rangle} \geq 0

Rearranging, we have:

\displaystyle \langle v,x \rangle^2 \leq \langle v,v\rangle \langle x,x \rangle

Which is just Cauchy-Schwarz, thus proving the inequality.


Proof 8: Proof using an identity

A variant of this proof is amongst the most common Cauchy-Schwarz proofs that are given in textbooks. Also, this is related to proof (6) above. However, it still has some value in its own right. While also giving an useful expression for the “defect” for Cauchy-Schwarz like the Lagrange Identity above.

We start with the following polynomial:

P(t) = \langle v - tw, v - tw \rangle. Clearly P(t) \geq 0.

To find the minimum of this polynomial we find its derivative w.r.t t and setting to zero:

P'(t) = 2t \langle w, w \rangle - 2 \langle v, w \rangle = 0 giving:

\displaystyle t_0 = \frac{\langle v, w \rangle}{\langle w, w \rangle}

Clearly we have P(t) \geq P(t_0) \geq 0. We consider:

P(t_0) \geq 0, substituting \displaystyle t_0 = \frac{\langle v, w \rangle}{\langle w, w \rangle} we have:

\displaystyle \langle v,v \rangle - \frac{\langle v, w \rangle}{\langle w, w \rangle} \langle v,w \rangle - \frac{\langle v, w \rangle}{\langle w, w \rangle} \langle w,v \rangle + \frac{\langle v, w \rangle^2}{\langle w, w \rangle^2}\langle w,w \rangle \geq 0 \ \ \ \ \ \ \ \ (A)

Just rearrangine and simplifying:

\displaystyle \langle v,v \rangle \langle w,w \rangle - \langle v, w \rangle^2 \geq 0

This proves Cauchy-Schwarz inequality.

Now suppose we are interested in an expression for the defect in Cauchy-Schwarz i.e. the difference \displaystyle \langle v,v \rangle \langle w,w \rangle - \langle v, w \rangle^2. For this we can just consider the L.H.S of equation (A) since it is just \displaystyle \langle w,w \rangle \Big(\langle v,v \rangle \langle w,w \rangle - \langle v, w \rangle^2\Big).

i.e. Defect =

\displaystyle \langle w,w \rangle \bigg(\langle v,v \rangle - 2 \frac{\langle v,w \rangle^2}{\langle w,w \rangle} + \frac{\langle v,w \rangle^2}{\langle w,w \rangle}\bigg)

Which is just:

\displaystyle \langle w,w \rangle\bigg(\Big\langle v - \frac{\langle w,v \rangle}{\langle w,w \rangle}w,v - \frac{\langle w,v \rangle}{\langle w,w \rangle}w \Big\rangle\bigg)

This defect term is much in the spirit of the defect term that we saw in Lagrange’s identity above, and it is instructive to compare them.


Proof 9: Proof using the AM-GM inequality

Let us first recall the AM-GM inequality:

For non-negative reals x_1, x_2, \dots x_n we have the following basic inequality:

\displaystyle \sqrt[n]{x_1 x_2 \dots x_n} \leq \Big(\frac{x_1 + x_2 + \dots x_n}{n}\Big).

Now let us define \displaystyle A = \sqrt{a_1^2 + a_2^2 + \dots + a_n^2} and \displaystyle B = \sqrt{b_1^2 + b_2^2 + \dots + b_n^2}

Now consider the trivial bound (which gives us the AM-GM): (x-y)^2 \geq 0, which is just \displaystyle \frac{1}{2}\Big(x^2 + y^2\Big) \geq xy. Note that AM-GM as stated above for n = 2 is immediate when we consider x \to \sqrt{x} and y \to \sqrt{y}

Using the above, we have:

\displaystyle \frac{1}{2} \Big(\frac{a_i^2}{A^2} + \frac{b_i^2}{B^2}\Big) \geq \frac{a_ib_i}{AB}

Summing over n, we have:

\displaystyle \sum_{i=1}^n\frac{1}{2} \Big(\frac{a_i^2}{A^2} + \frac{b_i^2}{B^2}\Big) \geq \sum_{i=1}^n \frac{a_ib_i}{AB}

But note that the L.H.S equals 1, therefore:

\displaystyle \sum_{i=1}^n \frac{a_ib_i}{AB} \leq 1 or \displaystyle \sum_{i=1}^n a_ib_i \leq AB

Writing out A and B as defined above, we have:

\displaystyle \sum_{i=1}^n a_ib_i \leq \sqrt{\sum_{i=1}^na_i^2}\sqrt{\sum_{i=1}^nb_i^2}.

Thus proving the Cauchy-Schwarz inequality.


Proof 10: Using Jensen’s Inequality

We begin by recalling Jensen’s Inequality:

Suppose that f: [p, q] \to \mathbb{R} is a convex function. Also suppose that there are non-negative numbers p_1, p_2, \dots, p_n such that \displaystyle \sum_{i=1}^{n} p_i = 1. Then for all x_i \in [p, q] for i = 1, 2, \dots, n one has:

\displaystyle f\Big(\sum_{i=1}^{n}p_ix_i\Big) \leq \sum_{i=1}^{n}p_if(x_i).

Now we know that f(x) = x^2 is convex. Applying Jensen’s Inequality, we have:

\displaystyle \Big(\sum_{i=1}^{n} p_i x_i \Big)^2 \leq \sum_{i=1}^{n} p_i x_i^2

Now, for b_i \neq 0 for all i = 1, 2, \dots, n, let \displaystyle x_i = \frac{a_i}{b_i} and let \displaystyle p_i = \frac{b_i^2}{\sum_{i=1}^{n}b_i^2}.

Which gives:

\displaystyle \Big(\sum_{i=1}^n \frac{a_ib_i}{\sum_{i=1}^{n}b_i^2}\Big)^2 \leq \Big(\sum_{i=1}^{n}\frac{a_i^2}{\sum_{i=1}^{n}b_i^2}\Big)

Rearranging this just gives the familiar form of Cauchy-Schwarz at once:

\displaystyle \Big(\sum_{i=1}^{n} a_ib_i\Big)^2 \leq \Big(\sum_{i=1}^{n} a_i^2\Big)\Big(\sum_{i=1}^{n} b_i^2\Big)


Proof 11: Pictorial Proof for d = 2

Here (page 4) is an attractive pictorial proof by means of tilings for the case d = 2 by Roger Nelson.


Read Full Post »