SLE Revision 4: The Gaussian Free Field and SLE4

I couldn’t resist breaking the order of my revision notes in order that the title might be self-referential. Anyway, it’s the night before my exam on Conformal Invariance and Randomness, and I’m practising writing this in case of an essay question about the Gaussian Free Field and its relation to the SLE objects discussed in the course.

What is a Gaussian Free Field?

The most natural definition is too technical for this context. Instead, recall that we could very informally consider a Poisson random measure to have the form of a series of Poisson random variables placed at each point in the domain, weighted infinitissimely so that the integrals over an area give a Poisson random variable with mean proportional to the measure of the area, and so that different areas are independent. Here we do a similar thing only for infinitesimal centred Gaussians. We have to specify the covariance structure.

We define the Green’s function on a domain D, which has a resonance with PDE theory, by:

G_D(x,y)=\lim_{\epsilon\rightarrow 0}\mathbb{E}[\text{time spent in }B(y,\epsilon)\text{ by BM started at }x\text{ stopped at }T_D]

We want the covariance structure of the hypothetical infinitesimal Gaussians to be given by \mathbb{E}(g(x)g(y))=G_D(x,y). So formally, we define (\Gamma(A),A\subset D) for open A, by (\Gamma(A_1),\ldots,\Gamma(A_n)) a centred Gaussian RV with covariance \mathbb{E}(\Gamma(A_1)\Gamma(A_2))=\int_{A_1\times A_2}dxdyG_D(x,y).

The good news is that we have a nice expression G_U(0,x)=\log\frac{1}{|x|}, and the Green’s functions are conformally invariant in the sense that G_{\phi(D)}(\phi(x),\phi(y))=G_D(x,y), following directly for conformality of Brownian Motion.

The bad news is that the existence is not clear. The motivation for this is the following though. We have a so-called excursion measure for BMs in a domain D. There isn’t time to discuss this now: it is infinite, and invariant under translations of the boundary (assuming the boundary is \mathbb{R}\subset \bar{\mathbb{H}}, which is fine after taking a conformal map). Then take a Poisson Point Process on the set of Brownian excursions with this measure. Now define a function f on the boundary of the domain dD, and define \Gamma_f(A) to be the sum of the values of f at the starting point of BMs in the PPP passing through A, weighted by the time spent in A. We have a universality relation given by the central limit theorem: if we define h to be (in a point limit) the expected value of this variable, and we take n independent copies, we have:

\frac{1}{\sqrt{n}}\left['\Gamma_f^1(A)+\ldots+\Gamma_f^n(A)-n\int_Ah(x)dx\right]\rightarrow\Gamma(A)

where this limiting random variable is Gaussian.

For now though, we assume existence without full proof.

SLE_4

We consider chordal SLE_k, which has the form of a curve \gamma[0,\infty] from 0 to \infty in H. The g_t the regularising function as normal, consider \tilde{X}_t=X_t-W_t:=g_t(x)-\sqrt{\kappa}\beta_t for some fixed x. We are interested in the  evolution of the function arg x. Note that conditional on the (almost sure for K<=4) event that x does not lie on the curve, arg x will converge either to 0 or pi almost surely, depending on whether the curve passes to the left or the right (respectively) of x.

By Loewner’s DE for the upper half-plane and Ito’s formula:

d\bar{X}_t=\sqrt{\kappa}d\beta_t,\quad d\log\bar{X}_t=(2-\frac{\kappa}{2})\frac{dt}{\bar{X}_t^2}+\frac{\sqrt{\kappa}}{\bar{X}_t}d\beta_t

So, when K=4, the dt terms vanish, which gives that log X is a local martingale, and so

d\theta_t=\Im(\frac{2}{\bar{X}_t}d\beta_t

is a true martingale since it is bounded. Note that

\theta_t=\mathbb{E}[\pi1(x\text{ on right of }\gamma)|\mathcal{F}_t]

Note that also:

\mathbb{P}(\text{BM started at }x\text{ hits }\gamma[0,t]\cup\mathbb{R}\text{ to the right of }\gamma(t)|\gamma[0,t])=\frac{\theta_t}{\pi} also.

SLE_4 and the Gaussian Free Field on H

We will show that this chordal SLE_4 induces a conformal Markov type of property in Gaussian Free Fields constructed on the slit-domain. Precisely, we will show that if \Gamma_T is a GFF on H_T=\mathbb{H}\backslash\gamma[0,T], then \Gamma_T+ch_T(\cdot)=\Gamma_0+ch_0(\cdot), where c is a constant to be determined, and h_t(x)=\theta_t(x) in keeping with the lecturer’s notation!

It will suffice to check that for all fixed p with compact support \Gamma_T(p)+c(h_T(p)-h_0(p)) is a centred Gaussian with variance \int dxdyG_H(x,y)p(x)p(y).

First, applying Ito and conformal invariance of the Green’s functions under the maps g_t,

dG_{H_t}(x,y)=cd[h(x),h(y)]_t

The details are not particularly illuminating, but exploit the fact that Green’s function on H has a reasonable nice form \log\left|\frac{x-\bar{y}}{x-y}\right|. We are also being extremely lax with constants, but we have plenty of freedom there.

After applying Ito and some (for now unjustified) Fubini:

dh_t(p)=\left(\int c.p(x)\Im(\frac{1}{\bar{X}_t})dx\right)d\beta_t

and so as we would have suspected (since h(x) was), this is a local martingale. We now deploy Dubins-Schwarz:

h_T(p)-h_T(0)\stackrel{d}{=}B_{\sigma(T)} for B an independent BM and

\sigma(T)=\int_0^Tdt(\int c.p(x)\Im(\frac{1}{\tilde{X}_t})dx)^2

So conditional on (h_T(p),t\in[0,T]), we want to make up the difference to \Gamma_0. Add to h_T(p)-h_0(p) an independent random variable distribution as N(0,s-\sigma(T)), where

s=\int dxdyp(x)p(y)G(x,y)\quad =:\Gamma_0(p)

Then

s-\sigma(T)=\int p(x)p(y)[G(x,y)-c\int_0^Tdt\Im(\frac{1}{X_t})\Im(\frac{1}{Y_t})]dxdy=\int p(x)p(y)G_t(x,y)dxdy as desired.

Why is this important?

This is important, or at least interesting, because we can use it to reverse engineer the SLE. Informally, we let T\rightarrow\infty in the previous result. This states that taking a GFF in the domain left by removing the whole of the SLE curve (whatever that means) then adding \pi at points on the left of the curve, which is the limit \lim_T h_T is the same as a normal GFF on the upper half plane added to the argument function. It is reasonable to conjecture that a GFF in a non-connected domain has the same structure as taking independent GFFs in each component, and this gives an interesting invariance condition on GFFs. It can also be observed (Schramm-Sheffield) that SLE_4 arises by reversing the argument – take an appropriate conditioned GFF on H and look for the interface between it being ‘large’ and ‘small’ (Obviously this is a ludicrous simplification). This interface is then, under a suitable limit, SLE_4.

Advertisement

SDEs and L-Diffusions

The motivation for the development of differential calculus by Newton et al. was to enable us to deduce concrete properties of, say, particle motion defined implicitly through ODEs. And we proceed similarly for the stochastic differential. Having defined all the terms through Ito’s formula, and concluded that BM is in some sense the canonical stochastic process, we seek to solve so-called stochastic differential equations of the form:

dX_t=b(X_t)dt+\sigma(X_t)dB_t

While there is no reason not to consider processes in \mathbb{R}^d, it is reasonable interesting to consider processes in one dimension. As with normal ODEs and PDEs, we have some intuitive notion if we specify some initial conditions, we should be able to set the differential equation up and ‘let it go’ like a functional clockwork mouse. Of course, we are conscious of the potential problems with uniqueness of solutions, stability of solutions, and general mathematical awkwardness that derives from the fact that we can’t treat all DEs as physical systems, with all the luxuries of definiteness that the physical world automatically affords. To establish some concreteness, we set up some definitions.

  • For a solution to the SDE, E(\sigma,b), we require a nice filtration \mathcal{F} and a BM adapted to that filtration to drive the process X_t, which satisfies X_t=X_0+\int_0^t\sigma(X_s)dB_s+\int_0^tb(X_s)ds, and we require this for each x_0\in\mathbb{R}^d s.t. X_0=x_0 a.s.
  • Uniqueness in law: all solutions to E(\sigma,b) starting from each x have the same law. Obviously, this places no restriction on the underlying probability space and filtration.
  • A stronger condition is Pathwise uniqueness: Given the filtration, solutions are almost surely indistinguishable (that is, paths are equal everywhere).
  • We have not specified any conditions on the filtration \mathcal{F}. It would be natural to consider only the minimal such filtration that works. If we can take \mathcal{F}=\mathcal{F}^B, the natural filtration of the driving BM, we say the solutions is strong. If every solution is strong, then we have pathwise uniqueness, otherwise we would have a solution where we could choose which path to follow independently of the BM.

The key theorem here is Yamada-Watanabe: If there exist solutions and we have pathwise uniqueness, then we have uniqueness in law. Then for every (\mathcal{F}_t), and \mathcal{F}_t-BM, the unique solution is strong.

Existence of solutions is particularly tractable when \sigma,b are Lipschitz, as this opens the way for implicit constructions as the fixed points of contracting mappings. We make particular use of Gronwall’s lemma, which confirms an intuitive thought that differential inequalities have solutions bounded by solutions to the corresponding ODE. Concretely, for $latex f\geq 0,\,f(t)\leq a+b\int_0^tf(s)ds,\quad 0\leq t\leq T$, the lemma states that f(t)\leq a\exp(bt). The case a=0 is obviously of particular interest for demonstrating convergence results. We deploy this method to show that when \sigma,b are Lipschitz, the SDE dX_t=\sigma(X_t)dB_t+b(X_t)dt has pathwise uniqueness and for any triple of filtration (\mathcal{F}_t), \mathcal{F}_t-adapted BM, and starting point x, there is a strong solution. Uniqueness in law then follows by Yamada-Watanabe, but we knew this anyway by composing measurable maps.

Now, given L, an operator on C^2 functions by:

Lf(x)=\frac12\sum_{i,j}a_{i,j}(x)\frac{\partial^2 f}{\partial x^i\partial x^j}+\sum_i b_i(x)\frac{\partial f}{\partial x^i}

We define X to be an L-diffusion if X’s local behaviour is specified (in distribution) by L(X). The first sum in the expression for L corresponds to diffusivity, while the second corresponds to (deterministic) drift. Formally, for a, b, bounded X_t a L-diffusion is \forall f\in C_b^2:

M_t^f:=f(X_t)-f(X_0)-\int_0^t Lf(X_s)ds is a martingale.

Alternatively, can relax boundedness condition, and require M_t^f\in\mathcal{M}_{c,loc}. To make a link to SDEs, define a=\sigma\sigma^T (so in one dimension a=\sqrt{\sigma}), then solutions to dX_t=\sigma(X_t)dB_t+b(X_t)dt are L-diffusions if boundedness conditions are met. Remember bounded implies Lipschitz implies solutions to SDEs. The result then follows directly from Ito’s formula.

Dubins-Schwarz Theorem

In developing the stochastic integral, much of our motivation has come from considering integrals with respect to Brownian Motion. In this section, we develop some results which justify that Brownian Motion is the canonical stochastic process with non-zero quadratic variation (which is related, but not directly equivalent to the property of infinite total variation). In particular, we shall observe the Dubins-Schwarz theorem, which shows that martingales with unbounded (as time \rightarrow\infty) quadratic variation ARE Brownian Motion, up to a (stochastic) time change.

Recall Levy’s characterisation of a d-dimensional BM, which allows us to avoid considering independent normal increments. Given X^1,\ldots,X^d\in\mathcal{M}_{c,loc}:

X=(X^1,\ldots,X^d) a BM iff [X^i,X^j]_t=\delta_{ij}t

Obviously, one direction has been shown as part of the construction and properties of quadratic variation. For the other direction,, because laws are precisely defined by characteristic functions, it suffices to show that

\mathbb{E}\left[\exp(i\langle \theta,X_t-X_s\rangle)|\mathcal{F}_s\right]=\exp(-\frac12||\theta||^2(t-s))

We set Y_t:=\langle \theta,X_t\rangle, and deduce [Y]=t||\theta||^2 and Z=\mathcal{E}(iY)=\exp(iY_t+\frac12[Y]_t)\in\mathcal{M}_{c,loc}, and furthermore is bounded on compact [0,t], hence is a true martingale. So \mathbb{E}\left(\frac{Z_t}{Z_s}|\mathcal{F}_s\right)=1 which is pretty much what was required.

Now, Dubins-Schwarz states

Theorem: Given M\in\mathcal{M}_{c,loc}, M_0=0, [M]_\infty=\infty almost surely, if we set \tau_s:=\inf\{t:[M]_t>s\}, then B_s:=M_{\tau_s} is a (\mathcal{F}_{\tau_s})-BM, with M_t=B_{[M]_t}.

This final result is clear if [M]_t is almost surely strictly increasing in t: just take s=[M]_t in the definition.

We know B is cadlag: we first show B as defined is almost surely continuous. It remains to show B_{s-}=B_s\,\forall s>0\iff M_{\tau_{s-}}=M_{\tau_s}, noting that \tau_{s-}=\inf\{t\geq 0:[M]_t=s\} (by continuity) is a stopping time also.

The only interesting case is if \tau_{s-}<\tau_s, for which need to show [M] is constant. This is intuitively obvious, but formally, we must appeal to (M^2-[M])^{\tau_s} which is UI, since \mathbb{E}[M^{\tau_s}]_\infty<\infty. Now may apply OST to obtain \mathbb{E}[M_{\tau_s}^2-M_{\tau_{s-}}^2|\mathcal{F}_{\tau_{s-}}]=\mathbb{E}[(M_{\tau_s}-M_{\tau_{s-}})^2|\mathcal{F}_{\tau_{s-}}]=0 which implies M is almost surely constant on [\tau_{s-},\tau_s]. We need to lift this to the case where it holds for all s simultaneously almost surely. Note that cadlag almost surely plus almost surely continuous at each point does not implies almost surely continuous everywhere (eg consider H(U(0,1)) with H the Heaviside function and U a uniform distribution). Instead, we record intervals of constancy of both M_t,[M]_t. That is, we set

T_r=\inf\{t>r:M_t\neq M_r\},\quad S_r=\inf\{t>r:[M]_t\neq [M]_r\}

Then these are cadlag, and by above T_r=S_r\,\forall r\in\mathbb{Q}^+ a.s. therefore T_r=S_r\,\forall r almost surely. Thus M, [M] are constant on the same intervals.

We also check B is adapted to \mathcal{G}_t=\mathcal{F}_{\tau_t}. STP X_T1_{\{T<\infty\}} is \mathcal{F}_T-measurable for X cadlag adapted. Approximating T discretely from above gives the result, exploiting that the result is clear if T has countable support. Now, obtain M^{\tau_s}\in\mathcal{M}_c^2, so M_{t\wedge \tau_s} UI by Doob, so by OST, get \mathbb{E}[M_{\tau_s}|\mathcal{F}_{\tau_s}]=M_{\tau_r}, to get B a martingale. The finally:

\mathbb{E}[B_s^2-s|\mathcal{G}_r]=\mathbb{E}[(M^2-[M])_{\tau_s}|\mathcal{F}_{\tau_s}]=M_{\tau_r}^2-[M]_{\tau_r}=B_r^2-r

And so we can apply Levy’s characterisation to finish the result.

Working towards the Ito Isometry

We have defined the stochastic integral (H\cdot A)_t for a finite variation, continuous adapted process A, and H a previsible process. We want to extend this to integrals with respect to continuous local martingales. Analogous to defining the Lebesgue integral by extending linearly from indicator functions, we define integrals of simple processes with respect to martingales. A simple process is a disjointly (time-)supported linear combination of the canonical previsible process Z_s1_{(s,t]}. It will be convenient to demand that the variables of integration are in \mathcal{M}^2. Then the space \mathcal{M}^2 is preserved under this integral operation, that is H\cdot M\in\mathcal{M}^2, and in particular

\mathbb{E}(H\cdot M)_\infty^2\leq ||H||_\infty^2\mathbb{E}(M_\infty-M_0)^2

By Doob’s L^2 inequality, X\in\mathcal{M}^2\Rightarrow ||\sup_t|X_t|||_2<\infty, so it makes sense to define \mathcal{C}^2 as the set of cadlag adapted processes s.t. |||X|||:=||\sup_t|X_t|||<\infty. Of particular interest is that \mathcal{C}^2 is complete, by finding limits for subsequences and lifting to the whole sequence by Fatou.

We aim to construct integrals over \mathcal{M}_{c,loc} by taking limits of integrals over \mathcal{M}_c^2. For this space, we have that M^2-[M] is not just \in\mathcal{M}_{c,loc} but in fact a UI true martingale. Then, as a natural extension of the orthogonality trick (remembering that M^2-[M]\in\mathcal{M}_{c,loc}), we obtain

\mathbb{E}[M]_\infty=\mathbb{E}(M_\infty-M_0)^2

We then define for M\in\mathcal{M}_c^2 the norm ||\cdot||_M by ||H||_M^2:=\mathbb{E}\left(\int_0^tH_s^2d[M]_s\right) with L^2(M) the space of H s.t. that this is finite. Then for H\in \mathcal{S}, we have the Ito isometry:

I: (L^2(M),||\cdot||_M)\rightarrow(M_c^2,||\cdot||) with I(H)=H\cdot M

Now, we also have S dense in L^2(M). Why? We know linear combinations of 1_A, A\in\mathcal{P} are dense, so it STP A\in\mathcal{P}\Rightarrow A\in\bar{S}. Use a typical Dynkin’s lemma argument on \mathcal{A}=\{A\in\mathcal{P}:1_A\in \bar{S}\}, which contains \pi-system generating \mathcal{P}. So extend to L^2(M) generally. We recover the original motivating result that [H\cdot M]=H^2\cdot[M]. Take a sequence of stopping times reducing both H and M to force boundedness of integrand, and M\in\mathcal{M}_c^2. Stopping the integral is equivalent to stopping the integrand, and checking limits in stopping times allows us to lift the Ito Isometry result to this one about quadratic variations.

Brownian Motion is not finite variation

There is a natural definition of ‘pathwise’ stochastic integrals of a certain type of ‘simple’ process with respect to cadlag non-decreasing processes. It can be a shown that a function is of finite variation iff it can be expressed as the difference of two such functions. Hence, these finite variation processes can be used as variable of integration via an obvious linear extension. One direction of this result is obvious; the other is fiddly. To proceed, we show that the total valuation process is cadlag (and, obviously, increasing), and then check that a'=\frac12(v+a),a''=\frac12(v-a) are processes satisfying the conditions of the result.

Our overall aim is to define integrals with respect to Brownian Motion since that is (in a sense to be made precise through the Dubins-Schwarz theorem later) the canonical non-trivial stochastic process with non-zero quadratic variation. The result we demonstrate shows that it is not possible to define the integral with respect to BM through pathwise finite variation integrals.

Theorem: M\in\mathcal{M}_{c,loc},M_0=0 a.s. is of finite variation. Then M is indistinguishable from 0.

We will show this for M a bounded martingale with bounded variation. Why does this suffice? In general, set S_n:=\inf\{t,V_t\leq n\}, noting that V is continuous adapted non-decreasing. If M^{S_n}\equiv 0\,\forall n, then we are done, as the S_ns are increasing. But this is a bounded martingale with bounded variation.

To prove this, we make use of the orthogonality relation which is a key trick for this sort of result: If M is a martingale, with M_s,M_t\in L^2, for s<t, then just by multiplying out:

\mathbb{E}[(M_t-M_s)^2|\mathcal{F}_s]=\mathbb{E}[M_t^2-M_s^2|\mathcal{F}_s] a.s.

Now, for this particular result, we decompose

\mathbb{E}[M_t^2]=\mathbb{E}\left[\sum_{k=0}^{2^n-1}(M_{(k+1)2^{-n}t}^2-M_{k2^{-n}t}^2)\right]=\mathbb{E}[\sum (M_{(k+1)2^{-n}t}-M_{k2^{-n}t})^2]

and then we bound this last term as

\leq \mathbb{E}\left[\sup_k [M_+-M_-]\sum_k |M_+-M_-|\right]

Now, as n\uparrow\infty, we have \sum_k |M_+-M_-|\uparrow V_t\leq N by the boundedness assumption. Furthermore, M is almost surely continuous on [0,t] and so it is in fact uniformly continuous, which allows us to conclude that

\sup_k |M_+-M_-|\downarrow 0

By bounded convergence, this limit applies equally under the expectation, and so conclude that \mathbb{E}M_t^2=0 for each time t, and so for each time t the martingale is almost surely equal to 0. In the usual, can lift this to rational points by countability, then to all points by continuity.

Loss Networks and Erlang’s Fixed Point

Loss Networks

In Erlang’s telephone line model discussed in the previous post, we considered users competing for a finite set of resources. When insufficient resources are available, the call is lost. A loss network generalises this situation to more complicated resource configurations. We think of links 1, …, J, each with some integer capacity c_j. Each incoming call requires 1 units of each link in some subset of the set of links, and lasts for a time distributed as an exponential random variable with parameter 1, independent of everything else in the model. We call this subset the route, and denote by A_{jr} the incidence matrix of links on routes. Calls arrive as PP(\nu_r)$ independently for each route: no queueing occurs – a call is lost if some link required is operating at full capacity. We call the probability of this event L_r, the loss probability. Observe that (n_r), the number of calls on each route r, is a Markov chain on the truncated space \{An\leq c\}.

By checking the DBEs, it is clear that an ED for this Markov chain is proportional to the ED for the MC without the capacity constraint, with state-space restricted to the truncated space. But without capacity constraints, the system is a linear migration process, for which we discovered the form of the ED in the previous section. If we write H(c)=\mathbb{P}(An\leq c) in the linear migration process, we can compute the acceptance probability for the finite capacity system as:

1-L_r=\frac{H(C-Ae_r)}{H(C)}

Approximating Blocking Probabilities

We want to calculate B_j, the equilibrium blocking probability, that a given link j is full. We have two methods: firstly, to find the distribution for (n_r) with maximum probability, for which the blocking probabilities appear as shadow prices. And secondly, to make a reasonable approximation about blocking independence, and solve explicitly. We want to show that these methods give the same answers.

To maximise the probability \pi(n)\propto \prod_r \frac{\nu_r^{n_r}}{n_r!} on \{An\leq c\}, we take logs and maximise using Stirling’s approximation, which is reasonable as we are implicitly working under a regime where the throughput tends to infinity while preserving ratios.

The primal problem is

\max\quad \sum_r(x_r\log \nu_r-x_r\log x_r+x_r),\quad\text{s.t. }Ax\leq c

which has Lagrangian

L(x,y,z)=\sum_r x_r+\sum_r x_r(\log \nu_r-\log x_r-\sum_j y_jA_{jr})+\sum_j y_jc_j-\sum_j y_jc_j

We observe that complementary slackness here has the form y.z=0, and remember that by Strong Duality, which applies here because everything relevant is convex, this equality holds at the primal optimum. Differentiating the Lagrangian at the optimum allows us to specify the optimal x in terms of y:

\bar{x}_r=\nu_r e^{-\sum y_jA_{jr}}

The dual problem is then to minimise

\min\quad \sum_r \nu_re^{-\sum_jy_jA_{jr}}+\sum_j y_jc_j

At this point, we make the suggestive substitution e^{-y_j}=1-B_j, observing that this gives B non-negative by default since y is non-negative. After further work, we will deduce that these B do indeed have a sensible interpretation as blocking probabilities, but it should be stressed that this is in no way obvious yet. Now complementary slackness asserts:

\sum_rA_{jr}\nu_r\prod_i(1-B_i)^{A_{ir}}\left\{\begin{array}{l l}=c_j& \quad B_j>0\\ \leq c_j & \quad B_j=0\\ \end{array} \right.

Note that the primal objective function is strictly convex so \bar{x} as discussed is the unique optimum. The dual objective is strictly convex in yA, so if A has full rank J, this induces a unique optimum in terms of y. We assume A is full rank (since for example we can perturb slightly) and that there is no degeneracy in the blocking.

Now we consider a sequence of networks with proportionally increasing arrival rates and capacities Continue reading

Elastic Networks and TCP

Now, suppose each route is given a rate x_r, and there is a utility function U_r(x_r). Links have capacities as before, so seek to solve the optimisation problem

\max\quad \sum_r U_r(x_r)\quad\text{s.t.}\quad Ax\leq c

This is called the SYSTEM problem. However, the system might not know the utility functions U, or the incidence matrix A. We can alternatively consider the USER problem, where for a given route we assume the cost of the rate is linear, and maximise (utility – cost). That is:

\max\quad U_r(\frac{w_r}{\lambda_r})-w_r,\quad w_r\geq 0

where the constant of linearity has been absorbed into the utility function. For the NETWORK problem, we essentially repeat the system problem, only under the assumption that utility functions are w_r\log x_r for fixed w. That is

\max\quad \sum_r w_r\log x_r,\quad\text{s.t.}\quad Ax\leq c

These optimisation problems are linked formally by the following theorem:

Theorem: We can find \lambda,w,x that solve USER and NETWORK, for which w_r=\lambda_rx_r. Furthermore, x then solves SYSTEM.

To prove this result, write down SYSTEM’s Lagrangian L(x,z,\mu)=\sum_r U_r(x_r)+\mu^T(c-Ax-z). The Lagrangian of NETWORK is similar, and at the optimum of USER, these Lagrangians are in fact the same, with \mu_j,\lambda_r shadow prices for resources and routes respectively.

We discuss two regimes for fairness of allocations:

1) (x_r) is max-min fair if it is a feasible flow, and if the smallest flow component is as large as possible. That is, if y is another feasible flow, then \exists r such that y_r>x_r\Rightarrow\exists s: y_s<x_s\leq x_r, ie you can’t increase a flow component without taking away from some other component which is smaller to begin with. Such a vector exists if the feasible region is convex, which in all practical examples it ought to be.

2) (x_r) is proportionally fair if it is feasible, and if for any other feasible vector, the change in the sum, weighted by x is non-positive, that is:

\sum_r \frac{x_r-y_r}{x_r}\leq 0

Alternatively, can think of this as saying that the sum of the percentage changes in each component is non-positive. We say the distribution is weighted proportionally fair with respect to w if

\sum_r w_r\frac{x_r-y_r}{x_r}\leq 0

Then it can be seen by considering behaviour in a neighbourhood of the optimum of the Lagrangian, that x solves NETWORK iff it weighted proportionally fair.

TCP and Lyapunov functions

Assume rates are controlled by the ODE

\frac{d}{dt}x_r(t)=k_r(w_r-x_r(t)\sum_{j\in r}\mu_j(t))

where \mu_j(t)=p_j(\sum_{s\ni j}x_s(t)). The motivation is that the increase in flow is linear (we increase it whenever it appears possible) and the decrease is multiplicative (that is, after negative feedback – a sign of failure to transmit a packet – the flow is halved). We seek a so-called Lyapunov function, which is a function of the system monotone along trajectories. In this case, a suitable example is

U(x)=\sum w_r\log x_r -\sum_j \int_0^{\sum_{s\ni j}x_s}p_j(y)dy

Such a function is differentiable, and has a unique maximum. Bounding the derivative away from 0 ensures that the maximum is actually attained in every case.

Effective Bandwidth

Here, devices have fixed capacity, but packet sizes are random. So, we still have a capacity constraint for the links, but we accept that it won’t be possible to ensure that we stay within those limits all the time, and seek instead to minimise the probability that the limits are exceeded, while keeping throughput as high as possible.

An important result is Chernoff’s Bound: \mathbb{P}(Y\geq 0)\leq \inf_{s\geq 0}\mathbb{E}e^{sY}. The proof is very straightforward: apply Markov’s inequality to the non-negative random variable e^{SY}. So in particular \frac{1}{n}\log\mathbb{P}(X_1+\ldots+X_n\geq 0)\leq \inf M(s), where M(s)=\log\mathbb{E}e^{sX}, and Cramer’s Theorem asserts that after taking a limit in n on the LHS, equality holds, provided \mathbb{E}X<0,\mathbb{P}(X>0)>0.

We assume that the traffic has the form S=\sum_{j=1}^J\sum_{i=1}^{n_j}X_{ji}, where these summands are iid, interpreted as one of the n_j loads used on source j. We have

\log\mathbb{P}(S>c)\leq\log \mathbb{E}[e^{s(S-C)}]=\sum_{j=1}^Jn_jM_j(s)-sC

so \inf(\sum n_jM_j(s)-sC)\leq -\gamma\quad\Rightarrow\quad \mathbb{P}(s\geq C)\leq e^{-\gamma}

so we want this to hold for large \gamma.

We might then choose to restrict attention to

A=\{n:\sum n_jM_j-sC\leq-\gamma,\text{ some }s\geq 0\}

So, when operating near capacity, say with call profile n* on (ie near) the boundary of A, with s* the argmin of the above. Then the tangent plane is \sum n_jM_j(s^*)-s^*C=-\gamma, and since A’s complement is convex, it suffices to stay on the ‘correct’ side (ie halfspace) of this tangent plane.

We can rewrite as \sum n_jM_j(S^*)\leq C-\frac{\gamma}{s^*}. Note that this is reasonable since s* is fixed, and we call \frac{M_j(s)}{s}=:\alpha_j(s), the effective bandwidth. It is with respect to this average that we are bounding probabilities, hence ‘effective’.

Observe that \alpha_j(s) is increasing by Jensen as (\mathbb{E}e^X)^t\leq \mathbb{E}e^{tX} for t>1 implies that for t>s, (\mathbb{E}e^{sX})^t\leq(\mathbb{E}e^{tX})^s.

In particular,

\mathbb{E}X\leq \alpha_j(s)\leq \text{ess sup}X

Models for Random Access Channels

To describe a situation where, for example, mobile phones compete for the use of a satellite for transmitting their messages, we consider a random access network. Here, we assume packets arrive at (necessarily different) stations at Poisson rate \nu, and times of transmissions are discrete. A station will always attempt to transmit a packet which has just arrived, then, if that fails, will retransmit after a time given by a independent random variables. Transmissions fail if at least two stations attempt to use the same window. For the ALOHA protocol, we assume these RVs are geometric with mean f^{-1}. We hope that the backlog, that is N_t the number of stations with packets to retransmit at time t eventually clears. Say Z_t is the number of attempts in (t,t+1)$, proportional to Y_t+\text{Bin}(N_t,f) where Y is the number of arrivals in (t,t+1), and then N_{t+1}=N_t+Y_t-1_{Z_t=1}.

Then the drift in N is positive once N gets large. This is, however, not enough to prove that the system is transient. Instead we compute p(n), the probability that the channel unjams (that is, an instance where no transmissions are attempted) before the backlog increases, given that the backlog is currently n. We can calculate that this probability is \sim n(1-f)^{n-1} and so \sum p(n)<\infty. Applying Borel-Cantelli to the record process, that is the times at which the backlog increases, it can be shown that almost surely, the channel unjams only finitely many times.

To maximise throughput, we would take f=\frac{1}{n}, but of course the individual stations don’t know what is. However, they could guess. Suppose they view the process and keep a counter:

S_{t+1}=\max\left(1,S_t+a1_{Z_t=0}+b1_{Z_t=1}+c1_{Z_t=*}\right)

Then transmission happens with probability f=\frac{1}{S_t}, with the idea that S tracks N well, especially when N is large. Observe that

\mathbb{E}[S_{t+1}-S_t|N_t=n,S_t=s]\rightarrow (a-c)e^{-k}+(b-c)ke^{-k}+c

where k=\frac{n}{s}. So (a,b,c)=(2-e,0,1) for example gives negative S drift when k>1, and positive drift when k<1, which is required to draw S towards N. But S might still diverge asymptotically away from N? We don’t want N to increase too fast when k>1, but

\mathbb{E}[N_{t+1}-N_t|N_t=n,S_t=s]\rightarrow \nu-ke^{-k}

so demanding \nu-ke^{-k}<k is what we require when \nu<e^{-1}, which is satisfied by the example suggested.

If a station only knows its own transmission attempts, we can use Ethernet, where after r unsuccessful attempts, wait for a time B_r chosen uniformly from [1,\ldots,2^r]. This is called binary exponential backoff. When \nu<\log 2, get infinitely many successful transmissions almost surely. But for positive recurrence, would have an equilibrium distribution, so by Little’s Law, we can equate arrival and departure rates. Considering the probabilities of 0 and 1 transmissions, obtain

\nu=\pi_1e^{-\nu}+\pi_0\nu e^{-\nu}\quad\Rightarrow\quad \nu\leq e^{-\nu}\quad\Rightarrow\quad \nu<0.567

So can be non-recurrent, but have infinitely many successful transmissions!

Braess’s Paradox and Traffic Networks

For a traffic network, we have routes and links as before, only here we specify the delay on a link as a function of the flow along that link D_j(y_j), where y_j=\sum_r A_{jr}x_r as before. Then a natural notion of an equilibrium is a distribution of flows such that no customer stands to gain by changing route. Formally, take S the set of source-destination pairs, H_{sr} the incidence matrix of pairs s and routes r, and S(r) the routes between the same source-destination pair as r.

Then define a Wardrop equilibrium to be a flow vector (x_r,r\in R) such that for any x_r>0,

\sum_j D_j(y_j)A_{jr}=\min_{r'\in S(r)}\sum_j D_j(y_j)A_{jr}

that is, the delay on this route is less than or equal to the delay on any route between that pair of vertices.

A natural assumption would be that D_j(y_j) is increasing in y_j for each j, and if we further assume that each D_j is continuously differentiable, then we can prove that a Wardrop Equilibrium exists.

Why? Consider the optimisation problem

\min\quad\sum_j \int_0^{y_j}D_j(u)du,\quad x\geq 0,Hx=f,Ax=y

where f is the flow requirement. The feasible region is convex and the objective function is convex differentiable so the optimum exists. The Lagrangian is

L(x,y,\lambda,\mu)=\sum_j \int_0^{y_j}D_j(u)du+\lambda\cdot (f-Hx)-\mu\cdot(y-Ax)

and at the minimum

0=D_j(y_j)-\mu_j,\quad 0=-\lambda_{S(r)}+\sum_j \mu_jA_{jr}

So \mu_j the delay on j, and then

\lambda_{S(r)}\left\{\begin{array}{l l}=\sum_j\mu_jA_{jr}& \quad x_r>0\\ \leq \sum_j\mu_jA_{jr} & \quad x_r=0\\ \end{array} \right.

So can interpret \lambda_{S(r)} as the minimum delay for this route, and so a solution to this problem is a Wardrop Equilibrium. If D_js are strictly increasing, then have uniqueness in (y_j), from the optimisation problem, but not in (x_r). For example, you could have two essentially identical routes from A to B, and from B to C, and it then doesn’t matter how you concatenate them to form a flow from A to C.

Note the counterintuitive Braess’s Paradox, where adding a route increasing the size of the delay at the Wardrop Equilibrium. The canonical example can be seen here, where the addition of the road across causes the delay to increase from 83 to 92. And indeed such effects have been observed in ‘real life’ as well.

Note that one explanation for why this seems counter-intuitive is that the objective function isn’t necessarily the most natural one. For a Wardrop equilibrium, the agents are acting entirely selfishly given present information. From the point of view of society, it might make more sense to minimise the average delay \min \sum_jy_jD_j(y_j). In this case the Lagrangian gives

0=D_j(y_j)+y_jD_j'(y_j)-\mu_j,\quad 0=-\lambda_{S(r)}+\sum_j\mu_jA_{jr}

So if add a congestion charge of y_jD_j'(y_j) to the delay function, the Wardrop Equilibrium becomes equal to the societal optimal. Note that this is NOT the societal optimal for (delay + charge). However, we have seen that a toll will enable selfish behaviour to produce a societal optimum. This is an example of mechanism design, which can be considered a branch of game theory, where the aim is to design systems so that agents work in a societally optimal manner.