In the previous post, I talked about eigenvalues, and some alternative characterisations which could be useful in some circumstances. Recently, I’ve been interested in controlling how eigenvalues and eigenvectors change as the matrix is varied. My particular example concerns positive matrices, which have a well-defined largest eigenvalue (or Perron root), and a unique (up to normalising in some way) principal eigenvector.
We might expect that perturbing a matrix slightly does not change the eigenvectors very much, since any original eigenvector is still almost an eigenvector, in the sense that its image under the action of the perturbed matrix is almost equal to a multiple of itself. But how to make this precise? And when does it go wrong?
Eigenvalues – The non-multiple case
Throughout, we assume we have a k x k matrix. We might want to allow the entries to be complex, but for now, real entries are perfectly interesting enough.
It makes sense to start with eigenvalues, since it’s easy to define these through the characteristic equation of the matrix. The coefficients of this polynomial are well-behaved (indeed polynomial) functions of the entries of the matrix. So we are really asking how the set of roots of a finite polynomial evolves as the (k+1) coefficients of the polynomial evolve. It is fairly clear that, under any sensible choice of topology on the space of k-(multi)-subsets of , the multiset of roots is continuous in the coefficients of the polynomial.
To say anything more precise, we have to introduce some notation.
Let be the characteristic polynomial of A. Each
is a polynomial of degree
in the entries of A. Let’s consider now a matrix-valued function A(t), and we assume that the entries of A(t) are all differentiable with respect to t. So each
is also differentiable with respect to t.
At this point, let’s make the assumption that t lies in some interval [r,s] for which the eigenvalues of A(t) are distinct. Let be some eigenvalue of A(t), chosen such that
is a continuous function of t. For example, we might take
, the eigenvalue with largest absolute value (with some canonical tie-breaking mechanism). Then
, and so differentiating with respect to
:
Because we deliberately demanded that the eigenvalues were disjoint, we have , and so
. In particular,
is differentiable with respect to the coefficients of the characteristic polynomial, and thus with respect to t also.
Multiple Eigenvalues
It gets more complicated when the characteristic equation has multiple roots. Typically we will be interested in the evolution of the eigenvalue with some extremal property, probably the largest one. Let’s restrict to the real, symmetric case, where the set of eigenvalues is complete and real. Suppose we have such that
has a repeated eigenvalue. Then, in a small enough region of
, we can define eigenvalues
continuously such that
while
for
. Then, if the entries of A(t) are analytic functions of t, then so are
.
But then will in general not be analytic, as the maximum of two smooth functions is in general Lipschitz.
This effect is most obvious in the case of a diagonal matrix , for which the largest eigenvalue is
.
Eigenvectors
When the matrix A is real and symmetric, we know it has real eigenvalues, and an orthogonal basis of eigenvectors. Then the Rayleigh quotient characterises the eigenvector as well as the eigenvalue. Recall that for any with
, we have
with equality precisely at the respective eigenvectors. So if we perturb A slightly, keeping it real and symmetric, we can control the principal eigenvector quite well by this method.
If A is not diagonalisable, we can still say something about this principal eigenvector, via large powers of A, sometimes called the Van Mises iteration. This says that for large N, should have direction close to that of the eigenvector, for any test vector v. The rate of convergence depends on the ratio of the largest eigenvalue to the second largest eigenvalue, though if the matrix is not diagonalisable, it is not completely trivial to quantify this convergence. We have to be careful though, since A maps the subspace orthogonal to the eigenvector to itself, so the magnitude of the projection of v onto the eigenvector determines the speed of convergence. Indeed, if v is orthogonal to the eigenvector, it won’t converge towards the principal eigenvector at all. (But if there is a well-defined ‘second eigenvector’ then it will converge towards that.)
Continuity of Eigenvectors
The reason why I ended up reading about some of these topics was that I wanted to show that the Perron eigenvector of a positive matrix (that is, the unique eigenvector corresponding to the Perron root) was Lipschitz continuous as a function of the entries of the positive matrix. Since for such a matrix, the largest eigenvalue is simple, we are able to make some progress.
In general, the condition that is an eigenvector of matrix A with eigenvalue
is described by the relation:
(*)
or whatever the most appropriate normalising condition appears. This describes an implicit relation between A and the eigenvalue-eigenvector pair . So given a matrix
with eigenvalue
corresponding to eigenvector
, in a neighbourhood of
in
we can use the implicit function theorem to comment on the differentiability of
with respect to A in this neighbourhood.
Precisely, we require the matrix of partial derivatives from (*)
to have non-zero determinant. But if is not simple, then if we apply this matrix from the left to one of the other eigenvectors (with a zero appended) we can see that it has non-trivial kernel. With a bit more work, we can show the converse too, and conclude that
are smooth with respect to A in some neighbourhood of
.
Finally, we observe that when the eigenvalues are not simple, we can’t even guarantee continuity of the eigenvectors. This is unsurprising really, since for a multiple eigenvalue, a) we might not know how many LI eigenvectors exists; and b) we might have complete freedom over the choice of eigenvectors. Think about the identity matrix! Indeed the eigenvectors of are (1,0) and (0,1), while the eigenvectors of
are (1,1), (1,-1). So no continuous choice of eigenvectors is possible here.