# Gromov-Hausdorff Distance on Trees

This post continues the exposition of Gromov-Hausdorff distance, as introduced in Chapter Four of Steve Evans’ Probability and Real Trees, which we are reading as a group in the Stats Dept in Oxford at the moment. In this post, we consider applications of the Gromov-Hausdorff distance we have just introduced in the context of trees, viewed as metric spaces.

First we consider a direct application of the previous result, which related the Gromov-Hausdorff distance to the infimum of the distortion across the family of correspondences between the two relevant metric spaces. I found this as Corollary 3.7 in notes by Le Gall and Miermont on scaling limits of random trees and maps, which can be found here. I’m not clear whether there is an original source, but the result is simple enough that probably it does not matter hugely.

Proposition – Given f,g excursions above [0,1], and $T_f, T_g$ the real trees associated with these excursions in the standard way, then

$d_{GH}(T_f,T_g)\le 2 ||f-g||.$

Proof: We construct an appropriate correspondence

$\mathcal {R}=\left\{(a,b) \,:\, \exists t\in[0,1]\text{ s.t. } a=p_f(t), b=p_g(t)\right\}.$

In other words, the trees are defined as projections from [0.1] (with some equivalence structure), so taking pairs of projections from [0,1] gives a natural correspondence. Now, suppose $(p_f(s),p_g(s)), (p_f(t),p_g(t))\in \mathcal {R}$. Then

$d_{T_f}(p_f(s),p_f(t)) = f(s) + f(t) - 2\hat f(s,t),$

where $\hat f(s,t):= \min{r\in[s,t]}f(r)$. Obviously, we can replace f by g to obtain

$d_{T_g}(p_g(s),p_g(t)) = g(s) + g(t) - 2\hat g(s,t).$

By thinking slightly carefully about where the functions achieve their minima compared with where the minimum in the sup norm is achieved, we can conclude that

$|d_{T_f}(p_f(s),p_f(t)) - d_{T_g}(p_g(s),p_g(t)) | \le 4||f-g||,$

and so the result follows from the relation between Gromov-Hausdorff distance and the infimum of distortions over the set of correspondences proved at the end of the previous post.

Gromov-Hausdorff Limits of Real Trees

If we are going to consider the Gromov-Hausdorff topology for limits of tree, we want to be sure that the limit of a sequence of real trees is another real tree. In particular, we want to show that this convergence preserves the property of being a geodesic space.

Theorem 4.19 – Given $X_n$ geodesic spaces, and $X$ complete, such that $X_n \stackrel{d_{GH}}\rightarrow X$, then X is geodesic.

Proof: Here, I’ll go in the opposite order to Evans’ book, as I think it’s easier understand why a special case of this implies the whole result once you’ve actually shown that special case.

Anyway, we want to show that given $x,y\in X$, there is a geodesic segment $x\leftrightarrow y$ in X. We will start by showing that there is a well-defined midpoint of the geodesic, that is a point $z \in X$ such that $d(x,z)=d(y,z)=\frac12 d(x,y)$.

Given arbitrary $\epsilon>0$, we can take n such that $d_{GH}(X,X_n) < \frac{\epsilon}{3}$ and then a correspondence $\mathcal{R}$ between $X,X_n$ with $\mathrm{dis}\mathcal {R}<\frac{2\epsilon}{3}$. Now, by definition of correspondence, we have $x',y'\in X_n$ such that $(x,x'),(y,y')\in\mathcal{R}$. But we do have geodesics in $X_n$, so we can take $z'\in X_n$ the midpoint of geodesic $x'\leftrightarrow y'$. Predictably, we now take $z\in X$ such that $(z,z')\in\mathcal {R}$.

We can show that z is ‘almost’ the midpoint of [x,y] in the sense that

$|d(x,z) - \frac12 d(x,y)| \le |d(x',z')-\frac12 d(x',y')+\frac32 \mathrm{dis}\mathcal{R} \le \epsilon.$

Similarly, we have $|d(y,z)-\frac12 d(x,y)|\le \epsilon$.

Perhaps it’s helpful to think of this point z that we’ve constructed as being like a taut, but not quite rigid string. The midpoint of the string has to be fairly near the midpoint of the endpoints, and in particular, as we let $\epsilon\rightarrow 0$, the z’s we deal with form a Cauchy sequence, and thus converge to some point, which (in a case of poor notation planning) we also call $z \in X$, which is the midpoint of $[x,y]$ as described before.

We can now iterate this iteration, to demonstrate that whenever q is a dyadic rational in [0,1], there exists $z_q \in X$ such that

$d(x,z_q)=q d(x,y),\quad d(y,z_q) = (1-q)d(x,y).$

Again then, if we want the above to hold for some general real r in [0,1], we can approximate r arbitrarily well by dyadic rationals q, and the associated points $z_q$ are Cauchy and thus have a well-defined limit with the required properties. We thus have our geodesic segment $x\leftrightarrow y$.

Rooted Gromov-Hausdorff Distance

In the end, the trees we want to compare might be rooted. For example, we talk about finite trees being invariant under random re-rooting, and we might be interested in similar results for real trees, in particular the BCRT. So we need to compare metric spaces as viewed outwards from particular identified points of each space.

An isometry of rooted spaces must map the root to the root, and so we adjust to obtain rooted Gromov-Hausdorff distance. We might try to consider embeddings into a common space so that the roots are shared, but it will be more convenient to maintain the infimum over metrics on the disjoint union as before. But to ensure the roots are in roughly the ‘same’ place in both set embeddings, we minimise the maximum of the Hausdorff distance between the sets, and the distance between the images of the roots in the common covering space.

Similar results apply as in the unrooted case, and normally the proofs are very similar. As we might expect, when defining a correspondence between rooted spaces, we demand that the pair of roots is one of the roots in the correspondence, and then the same equivalence between the minimal distortion and the rooted G-H distance applies.

Evans shows that $\mathbb{T}^{\mathrm{root}}$, the set of compact rooted trees is complete and separable under the rooted G-H topology. Separability is relatively easy to see. Compact trees have finite $\epsilon$-nets, and there is a canonical way to view the net as the vertices of a finite tree with edge lengths. Approximating these edge lengths by rationals and consider the countable family of isomorphism classes of rooted finite trees gives separability.

If we want, we can also define k-pointed Gromov-Hausdorff distance, where we demand k points in each space are held fixed.

Tree $\eta$-erasure

To show that this is a natural topology to consider for the family of trees, Evans devotes a short section to the operation of $\eta$-erasure, where all points within $\eta$ of a leaf are removed from a given tree.

Formally, $R_\eta$ is a map $\mathbb{T}^{\mathrm{root}}\rightarrow \mathbb{T}^{\mathrm{root}}$ (recalling that these are compact real trees), so that $R_\eta(T)$ consists of the tree

$\{\rho\}\cup\{a \in T \, : \, \exists x\in T,\, a\in[\rho, x],d_T(a,x)\ge \eta\},$

rooted again at $\rho$. We claim that the range of $R_\eta$ is the set of compact rooted trees with a finite number of leaves. In this setting, we want a geodesic definition of leaf, for example a leaf is a point that doesn’t lie in the interior of the (unique) geodesic segment between any other point and the root.

Given a tree T with a finite number of leaves, we can glue disjoint segments of length $\eta$ onto every leaf. Taking $R_\eta$ of this deeper tree will give T. Similarly, suppose $R_\eta(T)$ has infinitely many leaves, which we can label $a_1,a_2,\ldots$. Thus we also have $x_1,x_2,\ldots \in T$ such that $a_i\in[\rho,x_i]$, and the segments $\{[a_i,x_i]\}$ are disjoint, and all have length at least $\eta$, hence T cannot be compact, as it has no finite $\frac{\eta}{2}$ net.

It is clear that for fixed T, the family of these maps applied to T is continuous with respect to $\eta$ in the G-H topology. When $\eta$ is changed a small amount, the amount of extra tree removed is locally small, and so approximating correspondences by points in what is left is absolutely fine. Indeed, the operations $T_{\eta_1}, T_{\eta_2}$ commute to give $T_{\eta_1+\eta_2}$.

We want to show that for fixed $\eta$ this map $R_\eta$ is continuous with respect to G-H topology. Suppose two compact rooted trees S,T are covered by a rooted correspondence $\mathcal R$ with distortion $\epsilon \ll \eta$. We can’t immediately restrict $\mathcal R$ to $R_\eta(S)\times R_\eta(T)$, as it won’t necessarily be a surjection under the projection maps any more.

But we can get around this. Note that if $(x,y)\in\mathcal{R}$, then $|d_S(\rho_S,x) - d_T(\rho_T,y)| \le \epsilon$ by assumption. We will construct a correspondence $\mathcal R'$ on the erased trees as follows. Given $(x,y)\in\mathcal{R}$, if $(x,y)\in R_\eta(S)\times R_\eta(T)$, we keep it, and if $x\not\in R_\eta(S), y\not \in R_\eta(y)$, we throw it away. Suppose we have $(s,t) in \mathcal{R}$ with $s\in R_\eta(S), t\not \in R_\eta(T)$. Let $l_s$ be a leaf of S such that $s \in [\rho_S,l_s]$, and let $\bar t$ be the farthest point from the root of the geodesic $[\rho_T,t]$ restricted to $R_\eta(T)$. So the tree above $\bar t$ includes t and has height at most $\eta$.

In words, if a point appears in a pair in the correspondence but is removed by the erasure, we replace it in the pair with the point closest to the original point that was not removed by the erasure.

It remains to prove that this works. My original proof was short but false, and its replacement is long (and I hope true now), but will postpone writing this down either until another post, or indefinitely. The original proof by Evans and co-authors can be found as the main content of Lemma 2.6 in the original paper [1].

REFERENCES

[1] Evans, Pitman, Winter – Rayleigh processes, real trees, and root growth with regrafting.