# Characterisations of Eigenvalues

I’ve been working for much of the past few months on a version of the frozen percolation random graph process with types. The connectivity between types is controlled by a (finite) non-negative square matrix, and so I’ve been engaging with linear algebra theory to an extent I haven’t really experienced since the second or third year of undergraduate maths.

We are interested in whether the graphs in question are subcritical, critical or supercritical. As in the case of multitype branching processes, this is controlled by the principal eigenvalue of a related non-negative matrix. So I’ve been looking up lots of methods for controlling eigenvalues, and some have proved useful, and some have not, but I thought it would be worthwhile to present some of them here.

Bounds and characterisations of spectral radius

Throughout, I will be talking about finite, square matricies. Eigenvalues may be defined as roots of the characteristic polynomial, and so by the fundamental theorem of algebra, there is always at least one complex eigenvalue. There is always at least one eigenvector associated to any eigenvalue. However, the dimension of the eigenspace is not always the same as the multiplicity of the eigenvalue as a root of the characteristic polynomial. The latter is called algebraic multiplicity, while the former is geometric multiplicity.

For now though, this distinction will be unimportant. The spectral radius of a matrix A is defined as

$\rho(A)=\max \{|\lambda|\, : \, \lambda \text{ and eigenvalue of }A\}.$

We can bound the spectral radius in terms of the norm of the matrix. Remember that a matrix norm has to satisfy all the usual properties of a norm, as well as a submultiplicative property $|||AB|||\le |||A|||\cdot |||B|||$. This is good, as otherwise we would be free to replace any norm by an arbitrary multiple of itself, and so no useful bounds could ever emerge. Note that the submultiplicativity implies that $|||I_n||\ge 1$.

Now, let $\lambda,x$ be some eigenvalue and associated (right-)eigenvector respectively of matrix A. Let X be the square matrix given by taking all the columns to be x. Now $Ax=\lambda x$ implies $AX=\lambda X$, and so

$|\lambda| \cdot|||X||| = |||\lambda X||| = |||AX||| \le |||A|||\cdot |||X|||,$

and thus we conclude our most basic bound $\lambda \le |||A|||$.

When A is diagonalisable, life is particularly easy, but in general we can write A as a conjugate of its Jordan normal form. Then, by looking at each diagonal block of the Jordan normal form separately, we can show that

$\lim_{k\rightarrow 0}A^k = 0\quad \iff \quad \rho(A)<1.$

Then, applying this, with additional care, to the matrices $A / (\rho(A)\pm \epsilon)$, we derive Gelfand’s Formula, that $\rho(A) = \lim_{k\rightarrow \infty} ||A^k||^{1/k}$. Again, this applies for any matrix norm.

Real symmetric matrices

When the matrix is real and symmetric, it is not too hard to show that all the eigenvalues are real, and furthermore that all the geometric multiplicities are equal to the algebraic multiplicities. That is, the matrix is diagonalisable, and there is an (orthogonal) basis of eigenvectors. Once we assume we are working with respect to this eigenbasis, it is easy to see how the Rayleigh quotient characterisation of the largest (and smallest) eigenvalue works. Let’s say the eigenvalues are $\lambda_1\ge \lambda_2\ge\ldots \ge \lambda_n$, then for any $||x||_2=1$, we have $\lambda_1\ge x^T A x\ge \lambda_n$, and equality is attained when x is the respective eigenvector, normalised appropriately.

This is an especially useful characterisation of the largest eigenvalue, as for example we can see fairly easily that this means $\lambda_1$ is a convex function of the (real, symmetric) matrix.

We can generalise this Rayleigh quotient idea if we take k orthonormal vectors in $R^k$, arrange them in an nxk matrix P, so that $P^T P = I_k$. Now we consider the matrix $P^TAP$. [Note that if k=1, we are exactly considering $x^TA x$ as before.] Then Poincare’s Separation Theorem say that the eigenvalues $\mu_1\ge \mu_2\ge\ldots \mu \mu_k$ of $P^TAP$ (which is also real, symmetric) are bounded by the original eigenvalues:

$\lambda_{n-k+i} \ge \mu_i\ge \lambda_i.$

Since the trace is preserved under conjugation, and the trace is the sum of eigenvalues, we can apply this result with P’s columns taken to be the any k canonical basis vectors of $\mathbb{R}^k$. Without loss of generality, we may assume the basis has been chosen so that the diagonal elements of A satisfy $a_{11}\ge a_{22}\ge\ldots\ge a_{nn}$, and so now we have that the sequence $(a_{11},a_{22},\ldots,a_{nn})$ is majorised by $(\lambda_1,\lambda_2,\ldots,\lambda_n)$ and majorises $(\lambda_n,\lambda_{n-1},\ldots,\lambda_1)$. The first of these relations can be used via the setup of Karamata’s inequality to conclude that for any convex function f, we have

$\sum_{i=1}^n f(\lambda_i)\ge \sum_{i=1}^n f(a_{ii}).$

Gershgorin Circles

In fact, we can relate the eigenvalues to the diagonal entries of the matrix in a more general setting. We are motivated by the thought that if the off-diagonal entries are all very small, then the set of eigenvalues should be approximately given by the set of diagonal entries.

For a square complex matrix, let $\lambda,x$ be an eigenvalue, eigenvector pair. For any index i, we have

$\lambda - a_{ii}= \frac{\sum_j a_{ij}x_j}{x_i} - a_{ii} = \frac{\sum_{j\ne i}a_{ij}x_j}{x_i}.$

Now consider the i such that $x_i=\max |x_j|$, and take absolute values and apply the triangle inequality,

$|\lambda - a_{ii}| \le \sum_{j\ne i} \left| \frac{a_{ij}x_j}{x_i} \right| \le \sum_{j\ne i}|a_{ij}|.$

Let’s define $R_i=\sum_{j\ne i}|a_{ij}|$ to be the sum of the non-diagonal entries of the ith row. Then the Gershgorin circle theorem says that every eigenvalue lies within at least one of the discs $B(a_{ii},R_i)$, in the complex plane. So our motivation still makes sense. If the off-diagonal entries are small, this is a strong restriction, and if they are not typically smaller than the diagonal entries, then we perhaps do not learn very much. Obviously, we could apply the same argument to the columns too.

When the diagonal entries are distinct, and the off-diagonal entries are small, the Gershgorin discs are distinct, and we would expect each to contain exactly one eigenvalue, corresponding to the appropriate diagonal entry. In fact, we can say something stronger. In general, the union of the discs is a subset of the complex plane with some connected components. Then, if a component is the union of exactly r discs, then it contains exactly r of the eigenvalues.

To see this, consider multiplying all the off-diagonal entries by $z\in[0,1]$ and observe what happens as z varies from 0 to 1. When z=0, the matrix is diagonal, and each eigenvalue is in the Gershgorin disc (which is a single complex number). As z varies continuously, the characteristic polynomial varies continuously, and also its roots, that is the set of eigenvalues. So since each of the r eigenvalues are initially within the union of the r original, large Gershgorin discs, they must remain within this union as z varies, since they cannot ‘jump’ to another component.

It’s hard to know how time will allow, but provisionally in the next post I will talk about how to control the evolution of eigenvectors as a function of the matrix, and in particular what can go wrong.

REFERENCES

For the middle section, I used the progression from Chapter 4 of Matrix Differential Calculus with Applications in Statistics and Econometrics (Magnus and Neudecker).