Turan’s Theorem

Turan’s theorem gives bounds on the number of edges required in a graph on a fixed number of vertices n to guarantee it contains a complete graph of size r+1. Equivalently, an upper bound on the number of edges in a K_{r+1}-free graph. For some of the applications and proofs, it may be more natural to look instead at the complement graph, for which the theorem becomes a statement about the existence or otherwise of an independent set of size r+1.

Rather than give an expression for the bound immediately, it is more natural to consider the Turan graph T(n,r), the maximal graph on n vertices without a copy of K_{r+1}. This is constructed by dividing the vertices into r classes with ‘as equal size as possible’. That is, some classes have size \lfloor \frac{n}{r}\rfloor and others have size \lfloor \frac{n}{r}\rfloor +1. Then connect any pair of vertices which are not in the same class by an edge. This gives a complete r-partite graph on these classes. Since any collection of r+1 vertices contains at least two in the same class, it can’t contain a K_{r+1}. Note that the complement of the complete r-partite graph is the union of r disjoint complete graphs on the classes.

There are a number of ways to enumerate the edges in T(n,r), and some can get quite complicated quite quickly. After a moderate amount of thought, this is my favourite. Let n=\ell r+k, so T(n,r) has k classes of size (l+1) and (r-k) classes of size l. Pick an ordered pair of vertices uniformly at random. (So picking the same vertices is indeed an option, and is counted twice.) Then the probability they are the same class is

\frac{k}{r}\cdot\frac{\ell+1}{n}+\frac{r-k}{r}\cdot \frac{\ell}{n} = \frac{1}{r}.

So the probability they are in different classes is \frac{r-1}{r}, and we can treat all of the 2n^2 ordered pairs in this way, noting a) that we count everything twice; and b) we know a priori that we don’t have loops, so the fact that we’ve included these in the count doesn’t matter. We end up with the enumeration (1-\frac{1}{r})\frac{n^2}{2} for the edges in T(n,r).

A standard proof

For both proofs, I find it slightly easier to work in the complement graph, where we are aiming for the largest number of edges with an independent set of size (r+1). Suppose we have a graph with the minimal number of vertices such that there’s no independent set of given size. Suppose also that there is an edge joining vertices v and w, such that d(v)> d(w). Then if we change v’s neighbourhood \Gamma(v) so that it becomes the same as \Gamma(w), (that is, we replace v with a copy of w, and maintain the original edge vw), then it is easily checked that we still do not have an independent set of that size, but fewer edges.

Note that by attempting to make the neighbourhoods of connected vertices equal, we are making the graph look more like a union of complete components. We can do a similar trick if we have three vertices u,v,w such that there are edges between u and v and v and w, but not u and w. Then we know the degrees of u,v,w are the same by the previous argument, and so it can again be checked that making \Gamma(u),\Gamma(w) the same as \Gamma(v), and adding the edge uw reduces the number of edges, and maintains the non-existence of the independent set.

The consequence of this is that we’ve shown that the minimum can only be attained when presence of edges is an equivalence relation (ignoring reflexivity). Thus the minimum is only attained for a union of at most r complete graphs. Jensen (or any root-mean-square type inequality) will then confirm that the true minimum is attained when the sizes of the r components are as equal as possible.

A probabilistic proof

The following probabilistic proof is courtesy of Alon and Spencer. The motivation is that in the (equality) case of a union of complete graphs, however we try to build up a maximal independent set, we always succeed. That is, it doesn’t matter how we choose which vertex (unconnected to those we already have) to add next – we will always get a set of size r. This motivates a probabilistic proof, as an argument in expectation will have equality in the equality case, which is always good.

Anyway, we build up an independent set in a graph by choosing uniformly at random a vertex which is not connected to any we have so far, until this set of vertices is empty. It makes sense to settle the randomness at the start, so give the vertices a uniform random labelling on [n], and at each stage, choose the independent vertex with minimal label.

Thus, a vertex v will be chosen for the independent set if, and only if, it has a smaller label than all of its neighbours, that is, with probability \frac{1}{1+d(v)}. So the expected size of the independent set constructed in this fashion is

\sum_{v\in V(G)} \frac{1}{1+d(v)}\ge \frac{V}{1+\bar d} = \frac{V}{1+\frac{2E}{V}}.

One can chase through the expressions to get the bound we want back.

Olympiad example

The reason I was thinking about Turan’s theorem was a problem which the UK IMO squad was discussing. It comes from an American selection test (slightly rephrased): given 100 points in the plane, what is the largest number of pairs of points with \ell_1 distance in (1,2]?

The key step is to think about how large a collection of points can have this property pairwise. It is easy to come up with an example of four points which work, and seemingly impossible to come up with an example with five points. To prove this, I found it easiest to place a point at the origin, then explicitly work with coordinates relative the basis (1,1),(1,-1) for fairly obvious reasons in this metric.

Anyway, once you are convinced that you can’t have five points with this property pairwise, you are ready to convert into a graph-theoretic statement. Vertices correspond to points, and edges link pairs of points whose distance is in (1,2] as required. We know from the previous paragraph that there is no copy of K_5 here, so Turan’s theorem bounds the number of edges, ie the number of suitable pairs.

It also tells us under what sort of circumstances the bound is attained, and from this, it’s natural to split the 100 points into four groups of 25, for example by taking four points which satisfy the condition pairwise (eg a diamond around the origin), and placing each group very near one of the points.

Extensions and other directions

The existence of a complete subgraph is reminiscent of Ramsey theory, which in one case is a symmetric version of Turan’s theorem. In Turan, we are adding enough edges to force a complete subgraph, while in the original form of Ramsey theory, we are asking how large the graph needs to be to ensure that for any edge configuration, either the original graph or the complement graph includes a complete subgraph. It makes a lot more sense to phrase this in terms of colours for the purpose of generalisation.

A natural extension is to ask about finding copies of fixed graphs H other than the complete graph. This is the content of the Erdos-Stone theorem. I’d prefer to say almost nothing rather than be vague, but the key difference is that the bound is asymptotic in the number of vertices rather than exact. Furthermore, the asymptotic proportion of vertices depends on the chromatic number of H, which tells you how many classes r are required to embed H in a (large if necessary) r-partite graph. So it is perhaps unsurprising that the limiting proportions end up matching the proportion of edges in the Turan graphs, namely r-1/r as r varies, which leaves the exact scaling open to further investigation in the case where H is bipartite (hence has chromatic number 2).

Leave a comment