Large deviations theory

In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insurance mathematics, namely ruin theory with Cramér and Lundberg. A unified formalization of large deviation theory was developed in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristic ideas of concentration of measures and widely generalizes the notion of convergence of probability measures. In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insurance mathematics, namely ruin theory with Cramér and Lundberg. A unified formalization of large deviation theory was developed in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristic ideas of concentration of measures and widely generalizes the notion of convergence of probability measures. Roughly speaking, large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme or tail events. Consider a sequence of independent tosses of a fair coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial by X i , {displaystyle X_{i},} where we encode head as 1 and tail as 0. Now let M N {displaystyle M_{N}} denote the mean value after N {displaystyle N} trials, namely Then M N {displaystyle M_{N}} lies between 0 and 1. From the law of large numbers (and also from our experience) we know that as N grows, the distribution of M N {displaystyle M_{N}} converges to 0.5 = E ⁡ [ X ] {displaystyle 0.5=operatorname {E} } (the expected value of a single coin toss), almost surely. Moreover, by the central limit theorem, we know that M N {displaystyle M_{N}} is approximately normally distributed for large N {displaystyle N} . The central limit theorem can provide more detailed information about the behavior of M N {displaystyle M_{N}} than the law of large numbers. For example, we can approximately find a tail probability of M N {displaystyle M_{N}} , P ( M N > x ) {displaystyle P(M_{N}>x)} , that M N {displaystyle M_{N}} is greater than x {displaystyle x} , for a fixed value of N {displaystyle N} . However, the approximation by the CLT may not be accurate if x {displaystyle x} is far from E ⁡ [ X i ] {displaystyle operatorname {E} } unless N {displaystyle N} is sufficiently large. Also, it does not provide information about the convergence of the tail probabilities as N → ∞ {displaystyle N o infty } . However, the large deviation theory can provide answers for such problems. Let us make this statement more precise. For a given value 0.5 < x < 1 {displaystyle 0.5 x ) {displaystyle P(M_{N}>x)} . Define Note that the function I ( x ) {displaystyle I(x)} is a convex, nonnegative function that is zero at x = 1 2 {displaystyle x={ frac {1}{2}}} and increases as x {displaystyle x} approaches 1 {displaystyle 1} . It is the negative of the Bernoulli entropy with p = 1 2 ; {displaystyle p={ frac {1}{2}};} that it's appropriate for coin tosses follows from the asymptotic equipartition property applied to a Bernoulli trial. Then by Chernoff's inequality, it can be shown that P ( M N > x ) < exp ⁡ ( − N I ( x ) ) {displaystyle P(M_{N}>x) x ) {displaystyle P(M_{N}>x)} decays exponentially as N → ∞ {displaystyle N o infty } at a rate depending on x. This formula approximates any tail probability of the sample mean of i.i.d. variables and gives its convergence as the number of samples increases.

Parent Topic

Child Topic

No Parent Topic