# When is a Markov chain a Markov chain?

I’ve been taking tutorials on the third quarter of the second-year probability course, in which the student have met discrete-time Markov chains for the first time. The hardest aspect of this introduction (apart from the rapid pace – they cover only slightly less material than I did in Cambridge, but in half the time) is, in my opinion, choosing which definition of the Markov property is most appropriate to use in a given setting.

We have the wordy “conditional on the present, the future is independent of the past”, which is probably too vague for any precise application. Then you can ask more formally that the transition probabilities are the same under two types of conditioning, that is conditioning on the whole history, and conditioning on just the current value

$\mathbb{P}(X_{n+1}=i_{n+1} \,\big|\, X_n=i_n,\ldots,X_0=i_0) = \mathbb{P}(X_{n+1}=i_{n+1} \,\big |\, X_n=i_n),$ (*)

and furthermore this must hold for all sets of values $(i_{n+1},\ldots,i_0)$ and if we want time-homogeneity (as is usually assumed at least implicitly when we use the word ‘chain’), then these expressions should be functions of $(i_n,i_{n+1})$ but not n.

Alternatively, one can define everything in terms of the probability of seeing a given path:

$\mathbb{P}(X_0=i_0,\ldots,X_n=i_n)= \lambda_{i_0}p_{i_0,i_1}\ldots p_{i_{n-1}i_n},$

where $\lambda$ is the initial distribution, and the $p_{i,j}$s are the entries of the transition matrix P.

Fortunately, these latter two definitions are equivalent, but it can be hard to know how to proceed when you’re asked to show that a given process is a Markov chain. I think this is partly because this is one of the rare examples of a concept that students meet, then immediately find it hard to think of any examples of similar processes which are not Markov chains. The only similar concept I can think of are vector spaces, which share this property mainly because almost everything in first-year mathematics is linear in some regard.

Non-examples of Markov chains

Anyway, during the tutorials I was asking for some suggestions of discrete-time processes on a countable or finite state space which are not Markov chains. Here are some things we came up with:

• Consider a bag with a finite collection of marbles of various colours. Record the colours of marbles sampled repeatedly without replacement. Then the colour of the next marble depends on the set you’ve already seen, not on the current colour. And of course, the process terminates.
• Non-backtracking random walk. Suppose you are on a graph where every vertex has degree at least 2, and in a step you move to an adjacent vertex, chosen uniformly among the neighbours, apart from the one from which you arrived.
• In a more applied setting, it’s reasonable to assume that if we wanted to know the chance it will rain tomorrow, this will be informed by the weather over the past week (say) rather than just today.

Showing a process is a Markov chain

We often find Markov chains embedded in other processes, for example a sequence of IID random variables $X_1,X_2,\ldots$. Let’s consider the random walk $S_n=\sum_{i=1}^n X_i$, where each $X_i =\pm 1$ with probability p and (1-p). Define the running maximum $M_n=\max_{m\le n}S_m$, and then we are interested in $Y_n:=M_n-S_n$, which we claim is a Markov chain, and we will use this as an example for our recipe to show this in general.

We want to show (*) for the process $Y_n$. We start with the LHS of (*)

$\mathbb{P}(Y_{n+1}=i_{n+1} \,\big|\, Y_n=i_n,\ldots,Y_0=i_0),$

and then we rewrite $Y_{n+1}$ as much as possible in terms of previous and current values of Y, and quantities which might be independent of previous values of Y. At this point it’s helpful to split into the cases $i_n=0$ and $i_n\ne 0$. We’ll treat the latter for now. Then

$Y_{n+1}=Y_n+X_{n+1},$

so we rewrite as

$=\mathbb{P}(X_{n+1}=i_{n+1}-i_n \, \big |\, Y_n=i_n,\ldots, Y_0=i_0),$

noting that we substitute $i_n$ for $Y_n$ since that’s in the conditioning. But this is now ideal, since $X_{n+1}$ is actually independent of everything in the conditioning. So we could get rid of all the conditioning. But we don’t really want to do that, because we want to have conditioning on $Y_n$ left. So let’s get rid of everything except that:

$=\mathbb{P}(X_{n+1}=i_{n+1}-i_n\, \big |\, Y_n=i_n).$

Now we can exactly reverse all of the other steps to get back to

$= \mathbb{P}(Y_{n+1}=i_{n+1} \,\big|\, Y_n=i_n),$

which is exactly what we required.

The key idea is that we stuck to the definition in terms of Y, and held all the conditioning in terms of Y, since that what actually determines the Markov property for Y, rearranging the event until it’s in terms of one of the underlying Xs, at which point it’s easy to use independence.

Showing a process is not a Markov chain

Let’s show that $M_n$ is not a Markov chain. The classic mistake to make here is to talk about possible paths the random walk S could take, which is obviously relevant, but won’t give us a clear reason why M is not Markov. What we should instead do is suggest two paths taken by M, which have the same ‘current’ value, but induce transition probabilities, because they place different restrictions on the possible paths taken by S.

In both diagrams, the red line indicates a possible path taken by $(M_0,M_1,\ldots,M_4)$, and the blue lines show possible paths of S which could induce these.

In the left diagram, clearly there’s only one such path that S could take, and so we know immediately what happens next. Either $X_5=+1$ (with probability p) in which case $M_5=S_5=3$, otherwise it’s -1, in which case $M_5=2$.

In the right diagram, there are two possibilities. In the case that $S_4=0$, clearly there’s no chance of the maximum increasing. So in the absence of other information, for $M_5=3$, we must have $X_4=X_5=+1$, and so the chance of this is $p^2$.

So although the same transitions are possible, they have different probabilities with different information about the history, and so the Markov property does not hold here.