In the final class for Applied Probability, we discussed the so-called Inspection Paradox for an arrivals process. We assume that buses, sat, arrive as a Poisson process with rate 1, and consider the size of the interval (between buses) containing some fixed time T. The ‘paradox’ is that the size of this interval is larger in expectation than the average time between buses, which of course is given by an exponential random variable.
As with many paradoxes, it isn’t really that surprising after all. Perhaps what is more surprising is that the difference between the expected observed interval and the expected actual interval time is quite small here. There are several points of interest:
1) The Markov property of the Poisson process is key. In particular, this says that the expectation (and indeed the distribution) of the waiting time for a given customer arriving at T is not dependent on T, even if T is a random variable (or rather, a class of random variables, the stopping times). So certainly the inspection paradox property will hold whenever the process has the Markov property, because the inspected interval contains the inspected waiting time, which is equal in distribution to any fixed interval.
2) Everything gets slightly complicated by the fact that the Poisson process is defined to begin at 0. In particular, it is not reversible. Under the infinitesimal (or even the independent Poisson increments) definition, we can view the Poisson process not as a random non-decreasing function of time, but rather as a random collection of points on the positive reals. With this setup, it is clearly no problem to define instead a random collection of points on all the reals. [If we consider this instead as a random collection of point masses, then this is one of the simplest examples of a random measure, but that’s not hugely relevant here.]
We don’t need to worry too much about what value the Poisson process takes at any given time if we are only looking at increments, but if it makes you more comfortable, you can still safely assume that it is 0 at time 0. Crucially, the construction IS now reversible. The number of points in the interval [s,t] has distribution parameterised by t-s, so we it doesn’t matter which direction we are moving in down the real line. In this case, A_t, the time since the previous arrival, and E_t, the waiting time until the next arrival, are both Exp(1) RVs, as the memorylessness property applies in each time direction.
For the original Poisson process, we actually have A_t stochastically dominated by an Exp(1) distribution, because it is conditioned to be less than or equal to t. So in this case, the expected interval time is some complicated function of t, lying strictly between 1 and 2. In our process extended to the whole real line, the expected interval time is exactly 2.
This idea of extending onto the whole real line explains why we mainly consider delayed renewal processes rather than just normal renewal processes. The condition that we start a holding time at 0 is often not general enough, particularly when the holding times are not exponential and so the process is not memoryless.
3) There is a general size-biasing principle in action here. Roughly speaking, we are more likely to arrive in a large interval than in a small interval. The scaling required is proportional to the length of the interval. Given a density function f(x) of X, we define the size-biased density function to be xf(x). We need to normalise to give a probability distribution, and dividing by the expectation EX is precisely what is needed. Failure to account for when an observation should have the underlying distribution or the size-biased distribution is a common cause of supposed paradoxes. A common example is ‘on average my friends have more friends than I do’. Obviously, various assumption on me and my friends, and how we are drawn from the set of people (and the distribution of number of friends) is required that might not necessarily be entirely accurate in all situations.
In the Poisson process example above, the holding times have density function , so the size-biased density function if
. This corresponds to a
distribution, which may be written as the sum of two independent Exp(1) RVs as suggested above.
4) A further paradox mentioned on the sheet is the waiting time paradox. This says that the expected waiting time is longer if you arrive at a random time than if you just miss a bus. This is not too surprising: consider at least the stereotypical complaint about buses in this country arriving in threes, at least roughly. Then if you just miss a bus, there is a 2/3 chance that another will be turning up essentially immediately. On the sheet, we showed that the distribution has this property also, provided
.
We can probably do slightly better than this. The memoryless property of the exponential distribution says that:
In general, for the sort of holding times we might see at a bus stop, we might expect it to be the case that if we have waited a long time already, then we are less likely relatively to have to wait a long time more, that is:
and furthermore this will be strict if neither s nor t is 0. I see no reason not to make up a definition, and call this property supermemorylessness. However, for the subclass of Gamma distributions described above, we have the opposite property:
Accordingly, let’s call this submemorylessness. If this is strict, then it says that we are more likely to have to wait a long time if we have already been waiting a long time. This seems contrary to most real-life distributions, but it certainly explains the paradox. If we arrive at a random time, then the appropriate holding time has been ‘waiting’ for a while, so is more likely to require a longer observed wait than if I had arrived as the previous bus departed.
In conclusion, before you think of something as a paradox, think about whether the random variables being compared are being generated in the same way, in particular whether one is size-biased, and whether there are effects due to non-memorylessness.
Related articles
- Poisson processes appropriate for today (gottwurfelt.wordpress.com)
- Everything is a Random Variable (herdingcats.typepad.com)
- Simulation II: Markov Chains, Monte Carlo, and Markov Chain Monte Carlo (Introduction to Statistical Computing) (bactra.org)
- Sleeper theorems (johndcook.com)
- Refuting Arguments (brainoil.wordpress.com)
- Box paradoxes (tonysmaths.blogspot.com)
Inspection paradox: here’s a naive explanation I came up with that takes a different route from the ‘accidentally weighting a distribution by itself’ approach. Is it correct…?
If you turn up at some fixed time T, the amount of time you’ll have to wait for a bus is an exponential random variable, with the same parameter as the exp. r.v. describing the time between buses (due to memorylessness…). Now, the length of the interval is at least as long as the time you’ll have to wait, so the average interval (assuming we can kill equality…) must be bigger than the average gap between buses.
In case you were short of examples, a particular submemoryless distribution springs to mind!
Hi Andy, that’s absolutely right. My intention was that that would be clear from my point 2), in particular what I called E_t, but got a bit carried away in the other details.
An excellent example by the way! I imagine Alastair Cook or Hashim Amla (in the past couple of years at least) may well be similar…
Sorry, yes, you’re right. I skipped reading (2) in any detail because there were a few too many words that I’m not familiar with! And I think almost any batsman will have a similar distribution, though there are some counterexamples!
(and of course, at least empirically based on historically observed data in test matches,

so they can’t be submemoryless for all s,t, but probably are for ‘intermediate values’.