# Minimum Spanning Trees

In my last post, I discussed the Uniform Spanning Tree. To summarise very briefly, given a connected graph on n vertices, a tree is a subgraph, that is a subset of the edges, which is connected, but which contains no cycles. It turns out this requires the tree to have n-1 edges.

We are interested in natural mechanisms for generating randomly chosen spanning trees of a given graph. One way we can always do this is to choose uniformly at random from the set of possible trees. This UST is in some sense canonical, but it is worth knowing about some other measures on trees that might be of interest.

A family of natural problems in operations research concerns an arbitrary complex network, with some weight or cost associated to each connection. The question is how to perform some operation on the network so as to minimise the resulting cost. Perhaps the most famous such problem is that of the Travelling Salesman. The story is that a salesman needs to visit n locations and wants to do the trip as efficiently as possible. This might be thought of as some sort of financial or time cost, but proably the easiest way to set it up is to imagine he is trying to minimise the distance he has to travel. It is not hard to see why this problem might genuinely arise in plenty of real-world situations, where a organisation or agent is trying to be as efficient as possible.

It might be the case that it is not possible to travel between every pair of locations, but we needn’t assume that for now. So if he knows the distance between any pair of cities, he wants to know which of the possible routes gives the shortest overall distance. The problem is that there are n! routes, and this grows roughly like n^n, which is faster than exponential, so for as few as 20 cities it has turned into a comparison which is too large to compute.

There are various algorithms which reduce the number of routes that must be checked, and some approximation methods. But if you want the exact answer, it is not currently possible to calculate this in polynomial time.

Minimal Spanning Trees and Uniqueness

For the travelling salesman, we were looking for the minimal cost spanning path. In the case of the complete graph, this is the same as the minimal cost non-repeating path of length n-1. Such paths are a subset of the set of spanning trees on the underlying graph. So what if we look instead for the minimal cost spanning tree? This exists as after all, there are only finitely many spanning trees.

So far, this has been deterministic, but we were looking for a random spanning tree. We can achieve this by choosing the weights at random. Anything other than assigning the weights as an IID sequence seems likely to be complicated, but there isn’t a canonical choice of the distribution of the weights. Our first question will be whether the distribution of the weights affects the distribution of the induced MST. In fact it will turn out that so long as the distribution is continuous, it has no effect on the distribution of the MST. The continuous condition might seem odd, but it is present only to ensure that the weights almost certainly end up generating a unique MST.

It turns out that there is a straightforward greedy algorithm to find the MST once the weights are known. We will examine some consequences of this algorithm in the random setting. First we check uniqueness. The condition required for uniqueness is that the weights be distinct. Note that this is slightly weaker than the statement that all of sums of (n-1)-tuples be distinct, which immediately implies a unique MST.

We now prove this condition. Suppose we have distinct weights, and an associated MST. If the underlying graph is a tree, then the result is clear. Otherwise, add some extra edge e, with weight w(e). By the definition of a tree, this generates exctly one cycle. Consider the other edges, say $e_1,\ldots,e_k$ in this cycle. If any of $w(e_i)>w(e)$ then we can replace e_i with e to get a spanning tree with smaller weight, a contradiction of the claim that we started with an MST. So by distinctness of weights, we conclude that $w(e)>w(e_i)$ for all i.

Conversely, suppose we remove some edge e which IS in the MST. We end up with exactly two connected components. Consider all the edges in the underlying graph between the two components, and suppose that one of these f satisfies w(f)<w(e). Then if we add in edge f, which is by construction not in the original MST, we end up with a smaller total weight than we started with, a further contradiction.

We can summarise this in a neat form. Given an edge e between x and y, consider the set of all edges in the underlying graph with weight LESS THAN w(e). Then if x and y are in different components, the edge e must be in the MST. Since we have an explicit description of which edges are present, it follows that the MST is unique. The problem is that working out the component structure of the graph with higher weights removed is computationally rather intensive. We want a slightly faster algorithm.

Kruskal’s Algorithm

Several rather similar algorithms were developed roughly simultaneously. Prim’s algorithm is a slight generalisation of what we will discuss. Anyway, for now we consider Kruskal’s algorithm which has the advantage that it can be described without really needing to draw a diagram.

We start by ordering the weights. Without loss of generality, we might as well relabel the edges so that

$w(e_1)< w(e_2)<\ldots< w(e_{|E|}).$

Now, by the condition derived in the argument for uniqueness, we must have e_1 and e_2 in any MST. Now consider e_3. Unless doing so would create a cycle, add e_3. Then, unless doing so would create a cycle, add e_4. Continue. It is clear that the result of this procedure is acyclic. To check it is actually a spanning tree, we show that it is also connected. Suppose not, and two of the components are A and B. Let e be the edge between A and B with minimal weight. According to the algorithm, we should have included e in our MST because at no point would adding it possibly have created a cycle. So we have proved that this greedy algorithm does indeed give the (unique) MST.

A useful consequence of this is that we know the two edges with overall minimum weight are definitely in the MST. In the search for a random measure on spanning trees, what is most important is that we didn’t use the actual values of the weights in this construction, only the order. In other words, we might as well have assumed the weights were a random permutation from $S_{|E|}$. This now answers our original question about how the random weight MST depends in distribution on the underlying edge weight distribution. So long as with probability one the weights are distinct (which holds if the distribution is continuous), then the distribution of the resulting spanning tree is constant.

It’s not too hard to show this isn’t the same as UST: n=4 suffices as a counterexample. But the difference in asymptotic behaviour of properties such as the diameter is of interest, and will be explored in the next post.