The Perron-Frobenius Theorem for Stochastic Matrices

This article was prompted by a question asked by one of my students about 5 minutes before the end of the final lecture of my Markov Chains course in Linyi. Although I didn’t have time to think about it right then, the flight home offered plenty of opportunity for having a play around. I am well aware that the proof given below is not the best one, but the ideas of minimising quadrants of a matrix seemed fairly natural. Anyway, it’s been sitting on my desktop for over two months now, so I decided I should put it up.

———

Recall that the transition probabilities of a finite Markov chain are determined by a stochastic matrix P. That is, each row of P is a probability distribution. In both theory and applications, the existence and properties of an invariant distribution is of interest. This is a probability distribution \pi satisfying the relation:

\pi P=\pi. (1)

It is clear that \bigg(\begin{smallmatrix}1\\ \vdots\\1\end{smallmatrix}\bigg) is a right- or column eigenvector of P, with eigenvalue 1. Since the spectrum of P^T is the same as that of P, we conclude that 1 is a left-eigenvalue of P also. So we can be assured of the existence of a vector \pi satisfying (1). What is unclear is that this eigenvector \pi should be a probability distribution. Since at least one entry must be non-zero, it will suffice to show that every entry of \pi is non-negative.

A necessary condition for the uniqueness of an invariant distribution is that the Markov chain be irreducible. This is best defined using the terminology of random walks: the chain is irreducible if for every pair of states i,j\in I, it is possible to move from i to j and back again. In terms of the transition matrix, P is irreducible if it is not block upper-triangular, up to reordering rows and columns.

We want to show that when P is irreducible, the (unique) 1-eigenvector is a probability distribution. The standard method proposed in this context is to exhibit the invariant distribution directly. For example, Norris’s Markov Chains defines

\gamma_i^k=\text{ expected time spent in i between visits to k }=\mathbb{E}_k\sum_{n=0}^{T_k}1_{\{X_n=i\}},

and shows that (\gamma_i^k)_{i\in I} satisfies (1).

Nonetheless, the result required is clearly at least one step removed from the probabilistic interpretation, so it would be satisfying to find a direct proof of existence. Typically, one quotes the substantially more general theorem of Perron and Frobenius, the most relevant form of which is:

Theorem (Perron-Frobenius): Given A a non-negative and irreducible square matrix. Then there is a positive real eigenvalue \lambda with multiplicity 1 and such that all other eigenvalues have absolute value less than or equal to \lambda. Then the (unique up to scaling) left- and right-eigenvectors corresponding to \lambda are positive.

Here we present a short proof of this result in the case where A is the (stochastic) transition matrix of a Markov chain.

Proposition: An irreducible stochastic matrix has a 1-eigenvector with all entries non-negative.

Proof: We show instead the contrapositive: that if a stochastic matrix has a 1-eigenvector with both negative and non-negative components, then it is reducible. The motivation is this third eigenvector given in example (2). Observe that the communicating classes are \{1,2\} and \{3\}, and it is not hard to see that for any eigenvector with negative and non-negative components, the sign of a component is a class property.

Informally, given an n\times n stochastic matrix P, and a 1-eigenvector \pi with this property, we relabel the states so that the non-negative components, which we call A\subset I are first. That is, in a moderate abuse of notation:

\pi=(\underbrace{\pi_A}_{\geq 0}\quad\underbrace{\pi_B}_{<0}).\quad\text{ If we write P as }\begin{pmatrix}P_{AA}&P_{AB}\\P_{BA}&P_{BB}\end{pmatrix},

our aim is to show that the sub-matrices P_{AB} and P_{BA} are both zero. This implies that states in A and states in B do not communicate, showing that P is reducible. We can formulate this as a linear programming problem:

\text{Maximise }\sum_{\substack{x\in A,y\in B\\x\in B, y\in A}}p_{xy}\text{ s.t. }\begin{cases}p_{xy}\geq 0&\forall x,y\in I\\p_{x1}+\ldots+p_{xn}=1&\forall x\in I\\\pi_1p_{1y}+\ldots+\pi_np_{ny}=\pi_y&\forall y\in I\end{cases}

It suffices to show that this maximum is 0. Now, we take |A|=i, and assume that 1\leq i\leq n-1, that is, there are a positive number of negative and non-negative components. Noting that the sum of the rows in a stochastic matrix is 1, we may consider instead:

\text{Minimise }\sum_{\substack{x,y\in A\\x,y\in B}}p_{xy}\text{ s.t. }\begin{cases}p_{xy}\geq 0&\forall x,y\in I\\p_{x1}+\ldots+p_{xn}=1&\forall x\in I\\\pi_1p_{1y}+\ldots+\pi_np_{ny}=\pi_y&\forall y\in I\end{cases}

and it will suffice to show that this minimum is n. To do this, we consider instead the dual:

\text{Maximise }\lambda_1+\ldots+\lambda_n+\pi_1\mu_1+\ldots+\pi_n\mu_n,

\text{ s.t. }\lambda_x+\pi_y\mu_x\leq\begin{cases}1&\text{if }x,y\leq i\text{ or }x,y\geq i+1\& \text{otherwise}\end{cases}

The objective is certainly bounded by n. And in fact this is attainable, for example by taking:

\lambda_1=\ldots=\lambda_i=1,\quad \lambda_{i+1}=\ldots=\lambda_n=0

\mu_1=\ldots=\mu_i=0,\quad \mu_{i+1}=-\frac{1}{\pi_{i+1}}, \ldots,\mu_n=-\frac{1}{\pi_n}.

Applying strong duality for linear programming problems completes the proof.

Advertisement