Why do we need the Lebesgue integral?

I’m currently lecturing the course Fundamentals of Probability at KCL, where we cover some of the measure theory required to set up probability with a higher level of formality than students have seen in their introductory courses. By this point, we have covered:

Sigma-algebras, measures, and probability spaces, including the fact that not all subsets of $\mathbb{R}$ are Borel-measurable.
Measurable functions, which in probability spaces are random variables.

More recently, we’ve covered the construction of the Lebesgue integral for measurable functions $f:E\to\mathbb{R}$ for some general measure space $(E,\mathcal{E},\mu)$ .

In this post, I’ll summarise briefly this construction, and some key differences in context and usage between this and the more familiar Riemann integral. But I’ll also aim to answer the question: “Why do we need to set up the Lebesgue integral like this? Why can’t we do it directly, or more simply?“

Someone asked essentially this question after the lecture, and I think it deserves an answer. To address this, I’ll look a handful of plausible alternative directions for defining integration, and seeing where they go wrong.

Construction of the Lebesgue integral

But first, we have to recap what the construction is. It goes in three steps:

Construction for simple functions which only take finitely-many values, and all these values are non-negative.
Construction for measurable, non-negative functions, by considering a monotone limit of simple functions.
Construction for (general) measurable functions, by splitting into positive and negative parts.
It’s worth emphasising that for all of these steps, the integral is defined directly over the whole set E. This is an immediate contrast with the (improper) Riemann integral for functions $f:\mathbb{R}\to\mathbb{R}$ , where the integral $\int_{-\infty}^\infty$ has to be defined as a limit of integrals over finite ranges.

Adding a few details, a simple function has the form $f=\sum_{i=1}^k a_i \mathbf{1}_{A_i}$ , where $a_i\ge 0$ are non-negative coefficients, and $A_i\in \mathcal{E}$ are measurable sets. Note that the sum is finite.

Then the integral of $f$ is defined to be $\int_E f \,d\mu=\sum_{i=1}^k a_i \mu(A_i)$ . This matches our intuition if we are thinking of functions $f:\mathbb{R}\to\mathbb{R}$ with an idea of integrals as ‘area under a graph’, but it is just a definition.

Question: we will see shortly why it’s only relevant to consider non-negative coefficients. But initially, we might ask why it needs to be a finite sum? In general, when sums are finite, then they don’t need to be defined as a limit. Furthermore, they definitely exist! If we allowed infinite sums in the definition of simple functions, we would then need to exclude pathologies like $f=\sum_{n\ge 1} \mathbf{1}_{[-n,n]}$ which is infinite everywhere [1]. This will be a recurring theme of this post: an optimal definition only takes a limit when it’s certain that the limit exists.

Anyway, the arguments are sometimes a bit notation-heavy, but in the setting of simple functions, we can prove directly that the integral is a linear operator (and so satisfies $\int_E (\alpha f+\beta g)d\mu=\alpha \int_E f\,d\mu + \beta \int_E g\,d\mu$ ) and various other unsurprising-but-important results.

Following this, we define the integral for general measurable functions $f: E\to\mathbb{R}$ taking only non-negative values via a limit of simple functions. Specifically

$\int_E f\,d\mu = \sup\Big\{\int_E g\,d\mu,\,0\le g\le f\text{ a.e., }g\text{ simple}\Big\}.$ (*)

While it would be tempting to define $\int_E f\,d\mu$ as the limit of the integrals of a specific sequence of simple functions $f_n$ approximating $f$ (for example, defining $f_n$ by rounding $f$ down to the nearest multiple of $1/2^n$ ), the more abstract definition (*) turns out to be more useful in proofs [2].

Using definition (*), we derive the Monotone Convergence Theorem. Informally, this says that monotone limits of non-negative measurable functions respect integrals. More formally, when $f_n\uparrow f$ a.e., then $\int_E f_n\,d\mu\uparrow \int_E f\,d\mu \in[0,\infty]$ . As a bonus, we immediately recover that the limit of the integrals of the ’rounding-down approximations’ in the previous paragraph is the limit of the original function.

Finally, we extend to general measurable functions $f:E\to\mathbb{R}$ , potentially taking both positive and negative values. We identify the positive and negative parts $f^+,f^-$ of $f$ , and define the integral $\int_E f\,d\mu=\int_E f^+\,d\mu - \int_E f^-\,d\mu$ , provided that at least one of the RHS integrals is finite. The point is that it’s well-defined to write $\infty - 4 = \infty$ or $\pi - \infty=-\infty$ , but it’s not well-defined to study $\infty-\infty$ .

It might seem annoying that a consequence of this is that functions such as $f(x)=\mathbf{1}_{x>0}\frac{\sin x}{x}$ are then not integrable over $\mathbb{R}$ (even though they have a finite improper Riemann integral), but this is a direct consequence of defining the Lebesgue integral directly onto the whole set E. Any thoughts about ‘positive and negative contributions cancelling out’ are implicitly or explicitly thinking about $\mathbb{R}$ as a limit of compact ranges. [3]

Ideas for alternative constructions

Given the non-integrability of certain functions as above, one might consider whether it’s possible to tweak the definitions to avoid this. Here is a list of reasonable alternatives to the standard order of construction described above. In each case, I’ll draw attention to something that might go wrong (focusing on the $\mathbb{R}\to\mathbb{R}$ case, but the issues mostly generalise).

Truncation of range: We’ve touched on this earlier, but let’s briefly revisit what happens if we define integrals $\int_{\mathbb{R}}$ as a limit of intervals over finite range. Leaving aside the question of whether this is possible for more general spaces (see [3]), we already have issues with this approach over $\mathbb{R}$ . Note that once we restrict to finite range and continuous functions (or finitely-many discontinuities), it doesn’t matter if we are using Riemann or Lebesgue framework. But when studying a function like $f(x)\mathbf{1}_{\mathbb{R}\setminus\{0\}}(x)\frac{1}{x}$ , we have truncated results like

$\int_{-n}^n f(x)\mathrm{d}x = 0,\quad \int_{-n}^{2n}f(x)\mathrm{d}x = \log 2,$

and so taking the limit $\mathbb{R}=\bigcup_{n} [-n,n]$ would ‘define’ the integral of f over $\mathbb{R}$ to be zero, while taking the limit $\mathbb{R}=\bigcup_{n} [-n,2n]$ would give a different answer. So this certainly wouldn’t work as a general definition.

In practice, for improper integrals $\mathbb{R}$ , one generally studies $\int_{-\infty}^\infty = \lim\limits_{n\to\infty} \int_0^n + \lim\limits_{n\to\infty} \int_{-n}^\infty$ , with the same restriction on taking $\infty-\infty$ as we discussed earlier in this post. However, this split into the two half-lines is very much a feature of the reals. Even just in $\mathbb{R}^2$ , there is no easy analogue to this split.

Approximating all functions from below: What about if we try to approximate all measurable functions from below by simple functions, after relaxing the condition that the coefficients of a simple function have to be non-negative? Well, if a function $f$ takes arbitrarily negative values (eg all values in $(-\infty,0)$ ) then it just can’t be bounded below by a function which only takes finitely many values.

And if we allow simple functions to have negative coefficients and support on infinitely many sets $A_i$ , this is even worse than the situation discussed in footnote [1] already. At this point we would already have simple functions for which the integral isn’t defined, such as taking $f(x)=1$ for $x\ge 0$ and $f(x)=-1$ for $x<0$ . It’s not going to work very well to study the supremum of a set, some of whose values are not defined!

Jointly approximating from below/above: This is the suggestion that prompted me to write this post. Suppose we relax the non-negativity, but not the finite sum condition for a simple function. Then approximate $f:\mathbb{R}\to\mathbb{R}$ by ‘simple’ functions $f_n$ so that $|f_n|\le |f|$ , and $\mathrm{sign}(f_n)=\mathrm{sign}(f)$ a.e. That is, $f_n$ bounds f from below when f is positive, and from above when f is negative.

The issue here is that it’s not clear how to turn this into a definition. We can’t write

$\int_E f\,d\mu = \sup\Big\{\int_E g\, d\mu,\, g\text{ `simple'},\mathrm{sign}(g)=\mathrm{sign}(f),\,|g|\le |f|\Big\},$

because the sup would be attained by considering ‘simple’ functions g which are zero whenever f is negative. And it’s not well-defined to replace the supremum with a limit – it’s not a sequence, after all.

If you try and split it as a supremum over the range where $f\ge 0$ and an infimum over the range where $f<0$ , then you are actually back to the original definition of the general Lebesgue integral!

And going for a very concrete sequence doesn’t help either, unfortunately! If we define $f_n$ by rounding $f$ down/up (depending on whether $f$ is positive/negative) to the nearest multiple of $1/2^n$ , we could define $\int_E f\,d\mu=\lim\limits_{n\to\infty} \int_E f_n\,d\mu$ , notwithstanding some of the issues we mentioned earlier about committing to an specific approximating sequence.

However, since the $(f_n)_{n\ge 1}$ are not monotone in n, there is no guarantee that this limit exists. And, indeed, we can construct examples of $f$ where the limit does not exist. Consider, for example, the intervals $A_k=[2^{k-1},2^k)$ , and the function $f=\sum\limits_{k\ge 1}(-\frac{1}{2})^{k-1} \mathbf{1}_{A_k}$ . So then the approximation $f_n=\sum\limits_{k=1}^n (-\frac{1}{2})^{k-1} \mathbf{1}_{A_k}$ , and thus $\int_{\mathbb{R}} f_n\,dx = \sum\limits_{k=1}^n (-1)^{k-1}$ , which is 1 when n is odd, and 0 when n is even.

With the true definition of the Lebesgue integral $\int_{\mathbb{R}}f\,dx$ is undefined, which might seem equally disappointing, but in reality is more clear than ending up in a situation where you get different limits depending on whether you round down to the nearest multiple of $1/2^{2n}$ versus rounding down to the nearest multiple of $1/2^{2n+1}$ .

Footnotes

[1] – It’s not impossible to exclude such pathologies (for example by demanding that the $A_i$ are disjoint (*) ) but adding constraints to the definition of simple function is far from ideal if we need to work with them for proofs. For example, under (*), it becomes less obvious that the sum of two simple functions is simple.

[2] – if we define $f_n$ to be $f$ rounded down to the nearest multiple of $1/2^n$ , and similarly $g_n$ for $g$ , then we immediately run into problems when trying to prove that $\int_E (f+g)d\mu=\int_E f\,d\mu + \int_E g\,d\mu$ , as the rounding down operation doesn’t commute with addition.

[3] – and there are measure spaces which are not sigma-finite, for which E cannot be expressed as a (countable) union of finite-measure sets. But we still want a theory of integration over such spaces.

Eventually Almost Everywhere

A blog about probability and olympiads by Dominic Yeo

Why do we need the Lebesgue integral?

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply