# Convergence of Transition Probabilities

As you can see, I haven’t got round to writing a post for a while. Some of my reasons for this have been good, and some have not. One reason has been that I’ve had to give a large number of tutorials for the fourth quarter of the second year probability course here in Oxford. The second half of this course concerns discrete-time Markov chains, and the fourth problem sheet discusses various modes of convergence for such models, as well as a brief tangent onto Poisson Processes. I’ve written more about Poisson Processes than perhaps was justifiable in the past, so I thought I’d say some words about convergence of transition probabilities in discrete-time Markov chains.

Just to be concrete, let’s assume the state space K is finite, and labelled {1,2,…,k}, so that it becomes meaningful to discuss

$p_{12}^{(n)}:=\mathbb{P}(X_n=2|X_0=1).$

That is, the probability that if we start at state 1, then after n ‘moves’ we are at state 2. We are interested in the circumstances under which this converges to the stationary distribution. The heuristic is that we can view a time-step of a Markov chain as an operation on the space of distributions on K. Note that this operation is deterministic. If this sounds complicated, what we mean is that we specify an initial distribution, that is the distribution of $X_0$. If we consider the distribution of $X_1$, this is given by $\lambda P$, where $\lambda$ is the initial distribution, and P the transition matrix.

Anyway, the heuristic is that the stationary distribution is the unique fixed point of this operation on the space of distributions. It is therefore not unreasonable to assume that unless there are some periodic effects, we expect repeated use of this operation to move us closer to this fixed point.

We can further clarify this by considering the matrix form. Note that a transition P always has an eigenvalue equal to 1. This is equivalent to say that there is a solution to $\pi P=\pi$. Note it is not immediately equivalent to saying that P has a stationary distribution, as the latter must be non-negative and have elements summing to one. Only the first property is difficult, and relies on some theory or cleverness to prove. It can also be shown that all eigenvalues satisfy $|\lambda|\le 1$, and in general, there will be a single eigenvalue (ie dimension 1 eigenspace) with $|\lambda|=1$, and the rest satisfies $|\lambda|<1$. Then, if we diagonalise P, it is clear why $\pi P^n$ converges entry-wise, as $\pi UP^n U^{-1}$ converges. In the latter, only the entries in the row corresponding to $\lambda=1$ converge to something non-zero.

In summary, there is a strong heuristic for why in general, the transition probabilities should converge, and if they converge, that they should converge to the stationary distribution. In fact, we can prove that for any finite Markov chain, $p_{ij}^{(n)}\rightarrow \pi_j$, provided we two conditions hold. The conditions are that the chain is irreducible and aperiodic.

In the rest of this post, I want to discuss what might go wrong when these conditions are not satisfied. We begin with irreducibility. A chain is irreducible if it has precisely one communicating class. That means that we can get from any state to any other state, not necessarily in one step, with positive probability. One obvious reason why the statement of the theorem cannot hold in this setting is that $\pi$ is not uniquely defined when the chain is not irreducible. Suppose, for example, that we have two closed communicating classes A and B. Then, supported on each of them is an invariant distribution $\pi^A$ and $\pi^B$, so any affine combination of the two $\lambda \pi^A+(1-\lambda) \pi^B$ will give a stationary distribution for the whole chain.

In fact, the solution to this problem is not too demanding. If we are considering $p_{ij}^{(n)}$ for $i\in A$ a closed communicating class, then we know that $p_{ij}^{(n)}=0$ whenever $j\not\in A$. For the remaining j, we can use the theorem in its original form on the Markov chain, with state space reduced to A. Here, it is now irreducible.

The only case left to address is if i is in an open communicating class. In that case, it suffices to work out the hitting probabilities starting from i of each of the closed communicating classes. Provided these classes themselves satisfy the requirements of the theorem, we can write

$p_{ij}^{(n)}\rightarrow h_i^A \pi^A_j,\quad i\not\in A, j\in A.$

To prove this, we need to show that as the number of steps grows to infinity, the probability that we are in closed class A converges to $h_i^A$. Then, we decompose this large number of steps so to say that not only have we entered A with roughly the given probability, but in fact with roughly the given probability we entered A a long time in the past, and so there has been enough time for the original convergence result to hold in A.

Now we turn to periodicity. If a chain has period k, this says that we can split the state space into k classes $A_1,\ldots,A_k$, such that $p_{ij}^{(n)}=0$ whenever $n\not\equiv j-i \mod k$. Equivalently, the directed graph describing the possible transitions of the chain is k-partite. This definition makes it immediately clear that $p_{ij}^{(n)}$ cannot converge in this case. However, it is possible that $p_{ij}^{(kn)}$ will converge. Indeed, to verify this, we would need to consider the Markov chain with transition matrix $P^k$. Note that this is no longer irreducible, as it there are no transitions allowed between classes $A_1,\ldots,A_k$. Indeed, a more formal definition of the period, in terms of the lcd of possible return times allows us to conclude that there is no finer reducibility structure. That is, $A_1,\ldots,A_k$ genuinely are the closed classes when we consider the chain with matrix $P^k$. And so the Markov chain with transition matrix $P^k$ restricted to any of the $A_i$s satisfies the conditions of the theorem.

There remains one case which I’ve casually brushed over. When we were discussing the irreducible case, I said that if we had at least one communicating classes, then we could work out the limiting transition probabilities from a state in an open class to a state in a closed class by calculating the hitting probability of that closed class, then applying the standard version of the theorem to that closed class. This relies on the closed class being aperiodic.

Suppose otherwise that the destination closed class A has period k as before. If it were to be the case that the number of steps required to arrive at A had some fixed value mod k, or modulo a non-trivial divisor of k, then we certainly wouldn’t have convergence, for the same reasons as in the globally periodic case. However, we should ask whether we can ever have convergence?

In fact, the answer is yes. For concreteness, and because it’s easier to write ‘odd’ and ‘even’ than $m \mod k$, let’s assume A has size 2 and period 2. That is, once we arrive in A, thereafter we alternate deterministically between the two states. Anyway, for some large time n, we can write $p_{ca}^{(n)}$ for $a\in A, c\not\in A$ as:

$p_{ca}^{(n)}=h_i^A(n),$

where the latter term is the probability that we arrive in A at a time-step which has the same parity as n. It’s not terribly hard to come up with an example where this holds, and this idea holds in greater generality, where A has period k (and not necessarily just k states), we have to demand that the probability of arriving at a time which is a mod k is equal for all a in [0,k-1].

Of course, for applications, we don’t normally care much about irreducible chains, and we can easily remove periodicity by introducing so-called laziness, whereby on each time-step we flip a coin (biased if necessary) and stay put if it comes up heads, and apply the transition matrix if it comes up tails. Then it’s possible to get from any state to itself in one step, and so we are by construction aperiodic.