# Gaussian tail bounds and a word of caution about CLT

The first part is more of an aside. In a variety of contexts, whether for testing Large Deviations Principles or calculating expectations by integrating over the tail, it is useful to know good approximations to the tail of various distributions. In particular, the exact form of the tail of a standard normal distribution is not particularly tractable. The following upper bound is therefore often extremely useful, especially because it is fairly tight, as we will see.

Let $Z\sim N(0,1)$ be a standard normal RV. We are interested in the tail probability $R(x)=\mathbb{P}(Z\geq x)$. The density function of a normal RV decays very rapidly, as the exponential of a quadratic function of x. This means we might expect that conditional on $\{Z\geq x\}$, with high probability Z is in fact quite close to x. This concentration of measure property would suggest that the tail probability decays at a rate comparable to the density function itself. In fact, we can show that:

$\mathbb{P}(Z>x)< \frac{1}{\sqrt{2\pi}}\frac{1}{x}e^{-x^2/2}.$

It is in fact relatively straightforward:

$\mathbb{P}(Z>x)=\frac{1}{\sqrt{2\pi}}\int_x^\infty e^{-u^2/2}du< \frac{1}{\sqrt{2\pi}}\int_x^\infty \frac{u}{x}e^{-u^2/2}du=\frac{1}{\sqrt{2\pi}}\frac{1}{x}e^{-x^2/2}.$

Just by comparing derivatives, we can also show that this bound is fairly tight. In particular:

$\frac{1}{\sqrt{2\pi}}\frac{x}{x^2+1}e^{-x^2/2}<\mathbb{P}(Z>x)< \frac{1}{\sqrt{2\pi}}\frac{1}{x}e^{-x^2/2}.$

—-

Now for the second part about CLT. The following question is why I started thinking about various interpretations of CLT in the previous post. Suppose we are trying to prove the Strong Law of Large Numbers for a random variable with 0 mean and unit variance. Suppose we try to use an argument via Borel-Cantelli:

$\mathbb{P}(\frac{S_n}{n}>\epsilon) = \mathbb{P}(\frac{S_n}{\sqrt{n}}>\epsilon\sqrt{n})\stackrel{\text{CLT}}{\approx}\mathbb{P}(Z>\epsilon\sqrt{n}).$

Now we can use our favourite estimate on the tail of a normal distribution.

$\mathbb{P}(Z>\epsilon\sqrt{n})\leq \frac{1}{\epsilon\sqrt{n}\sqrt{2\pi}}e^{-n/2}$

$\Rightarrow \sum_n \mathbb{P}(Z>\epsilon\sqrt{n})\leq \frac{1}{\sqrt{2\pi}}(e^{-1/2})^n=\frac{1}{\sqrt{2\pi}(e^{1/2}-1)}<\infty.$

By Borel-Cantelli, we conclude that with probability 1, eventually $\frac{S_n}{n}<\epsilon$. This holds for all $\epsilon>0$, and a symmetric result for the negative case. We therefore obtain the Strong Law of Large Numbers.

The question is: was that application of CLT valid? It certainly looks ok, but I claim not. The main problem is that the deviations under discussion fall outside the remit of discussion. CLT gives a limiting expression for deviations on the $\sqrt{n}$ scale.

Let’s explain this another way. Let’s take $\epsilon=10^{-2}$. CLT says that as n becomes very large

$\mathbb{P}(\frac{S_n}{\sqrt{n}}>1000)\approx \mathbb{P}(Z>1000).$

But we don’t know how large n has to be before this approximation is vaguely accurate. If in fact it only becomes accurate for $n=10^{12}$, then it is not relevant for estimating

$\mathbb{P}(\frac{S_n}{\sqrt{n}}>\epsilon\sqrt{n}).$

This looks like an artificial example, but the key is that this problem becomes worse as n grows, (or as we increase the number which currently reads as 1000), and certainly is invalid in the limit. [I find the original explanation about scale of deviation treated by CLT more manageable, but hopefully this further clarifies.]

One solution might be to find some sort of uniform convergence criterion for CLT, ie a (hopefully rapidly decreasing) function $f(n)$ such that

$\sup_{x\in\mathbb{R}}|\mathbb{P}(\frac{S_n}{\sqrt{n}}>x)-\mathbb{P}(Z>x)|\leq f(n).$

This is possible, as given by the Berry-Esseen theorem, but even the most careful refinements in the special case where the third moment is bounded fail to give better bounds than

$f(n)\sim \frac{1}{\sqrt{n}}.$

Adding this error term will certainly destroy any hope we had of the sum being finite. Of course, part of the problem is that the supremum in the above definition is certainly not going to be attained at any point under discussion in these post-$\sqrt{n}$ deviations. We really want to take a supremum over larger-than-usual deviations if this is to work out.

By this stage, however, I hope it is clear what the cautionary note is, even if the argument could potentially be patched. CLT is a theorem about standard deviations. Separate principles are required to deal with the case of large deviations. This feels like a strangely ominous note on which to end, but I don’t think there’s much more to say. Do comment below if you think there’s a quick fix to the argument for SLLN presented above.