Mixing of the Noisy Voter Model

I’ve been spending a bit of time recently thinking about some interacting particle systems and mixing time problems, so it was interesting to find a short paper due to Harishchandra Ramadas on arXiv a couple of weeks ago combining these two areas. I enjoyed reading it, so I thought I would summarise its arguments here, in preparation for perhaps talking about it at this week’s junior probability seminar.

The voter model is a graph-based interacting particle system. The underlying motivation is to describe the spread of opinions through a population. Each vertex represents an agent who has an opinion which might vary over time. At exponential rates, an agent will replace their opinion with the opinion of one of their neighbours, chosen uniformly at random. Note that, obviously, this might not actually involve a change of opinion.

One problem with this model is that we might be interested in convergence to equilibrium, but this might be hard to specify, as the state where everything is 0 and the state where everything is 1 are absorbing, and similarly any binomial distribution on the set of vertices is invariant. It is also not entirely realistic to assume that future updates to the system are entirely based on the current configuration. In any actual application, we should allow for some, possibly very small, influence from the outside, whether due to random errors in the update rule, or some external factors which haven’t been explicitly accounted for.

So we consider instead the noisy voter model where, in addition to the dynamics described above, each vertex changes state at rate $\delta$, independently of the configuration or the history. [Note that we are now considering the set of opinions {0,1} ]

We are interested in the mixing time of this system. It is not immediately clear that this should have an invariant distribution, but it seems reasonable since the noise, which converts 0 to 1 and 1 to 0 at equal rates if the proportion of 0s and 1s is equal, moves the system towards something like Bin(V(G),1/2), although the correlations need to take account of the local geometry.

Anyway, we assume $\delta>0$ is fixed, and n tends to infinity. Ramadas proves that the dynamics have the property of optimal temporal mixing, which states that given copies of the process X, Y starting from different initial conditions

$||X_t-Y_t||\le C n e^{-ct},$

for suitable choice of constants. Note that it doesn’t matter whether X and Y are coupled or independent. Total variation distance is a function of distribution, not of probability space. I don’t have a great intuition for exactly what this means, except to make the obvious comment that the distance from uniformity grows at most uniformly as the size of the system grows. Note that this immediately implies that the mixing time is O(log n). The rest of the paper proves that the noisy voter model has this property.

The method is to consider the relation:

$||X-Y||\le \mathbb{P}(X\ne Y),$

which says that the total variation distance gives the minimum of the RHS probability over couplings of X and Y. Anyway, as discussed in the previous post on duality in such interacting particle systems, the key step is to consider the process as a deterministic walk through a random environment. Ie, we apply the same moves to configurations started from different initial conditions.

The first step is to break down the noise mechanism to make it easier to couple everything. Instead of saying that a vertex changes state at rate $\delta$, we instead have two noise processes. One sets a vertex’s state to 0 at rate $\delta$, and another sets a vertex’s state to 1 at rate $\delta$ independently. Since at any configuration, for each vertex, precisely one of these processes has no effect.

So for the coupling, we run the same noise processes, and we choose the neighbours from which to update our opinions the same in both processes. The only difference is the initial configuration. With these two processes, X and Y defined on the same probability space, Ramadas defines $Z_t(v)=\mathbf{1}\{X_t(v)\neq Y_t(v)\}$, which counts how many vertices are different between the two processes. Almost surely, eventually all the Zs will be 0, and this state is absorbing because of how we’ve defined this coupling. We are interested to find out how long it takes for this event, which we could think of like a strong stationary time, to happen.

To control this, we track the evolution in the probabilities of $Z_t(v)=0$ jointly. We note that after a noise event at vertex v, we definitely have $Z_t(v)=0$. After an opinion-change event, whether $Z_t(v)$ changes depends on whether $Z_t(u)$ is 1, where u is the uniformly-chosen vertex from which v inherits its opinion in both processes. We obtain the following ODEs:

$\frac{d}{dt}\mathbb{P}(Z_t(v)=1)=\frac{1}{d(v)}\left(\sum_{u:u\sim v}\mathbb{P}(Z_t(v)=0,Z_t(u)=1)\right)$

$-\frac{1}{d(v)}\left(\sum_{u:u\sim v}\mathbb{P}(Z_t(v)=1,Z_t(u)=0)\right)-2\delta \mathbb{P}(Z_t(v)=1).$

We interpret this sum as a negative bit plus two terms which we would hope would cancel each other out. It seems impractical to attempt to formalise this heuristic. However, we can instead choose to consider at time t to consider the ODE for whichever vertex v maximises $Z_t(v)$. It’s then clear that the RHS is less than $-2\delta \mathbb{P}(Z_t(v)=1)$.

Right-differentiabililty of $Z_t(v_{max})$ follows from differentiability of $Z_t(v)$. The number of vertices is finite, and almost surely the vertex attaining this maximum can’t change too often. In particular, from the ODE, we must have $Z_t(v_{max})$ bounded above by $Ce^{-2\delta t}$. From which it follows that

$\mathbb{P}(X\ne Y)\le nCe^{-2\delta t },$

and we’ve shown this optimal temporal mixing property, and hence the mixing time has order log n.

What I’ve discussed so far is taken entirely from Section 3 of Ramadas’s paper. My first thought was whether this bound is actually correct in scale, as we have used only the size of the graph rather than any finer information about its graph structure. The author discusses that it is not possible to use a general result due to Hayes and Sinclair on this topic in this setting, as it is not reversible.

I certainly can’t answer this question, but I will make a comment on the extreme examples. If the graph is the empty graph on n vertices, then the voter model dynamics are null, so we essentially have just a coupon collector problem going on. It doesn’t lose any generality to start from entirely 0s, then at time t, the distribution of the system will be Bin(n,p), with p a function of time, but not n. Now note that

$||\mathrm{Bin}(n,p)-\mathrm{Bin}(n,q)||_{TV}=\Theta(\sqrt{n}|p-q|),$

so in this example, we need to choose a time so that $p=\frac12+\Theta(\sqrt{n})$. This is really a statement about the central limit theorem. It’s clear that Bin(n,1/2) should genuinely be the equilibrium distribution here. The evolution of p(t) is exponential, and hence we get the O(log n) scaling emerging. Note that the presence of $\sqrt{n}$ rather than any other power of n makes no difference, though it would be relevant if we actually wanted to know the constant.

At the other end of the connectivity spectrum, we consider the dynamics on the complete graph. Here, let’s count the number of 1s, and call it W(t) say. Then W goes from k to k+1 at rate $\frac{(n-k)\delta}{2}+\frac{k(n-k)}{n}$, and from k to k-1 at rate $\frac{k\delta}{2}+\frac{k(n-k)}{n}$. In particular, note in expectation, the contributions from the voter model dynamics cancel. This fits with our intuition that it is solely the noise driving the system towards equilibrium.

If we take an ODE approximation, we get:

$\frac{dX}{dt}=\frac{n}{2}-\delta x,$

from which it follows that

$t(x)=\alpha-\frac{1}{\delta}\log(C-\delta x),$

which gives precisely the correct scaling, when we take x=n/2.

I would be very interested to think about or hear about arguments for a general lower bound across all graphs.

REFERENCES

Hayes, Sinclair (2007) – A general lower bound for mixing of single-site dynamics on graphs (arXiv)

Ramadas (2014) – Mixing of the Noisy Voter Model (arXiv link at top of article)