Skip to content Skip to sidebar Skip to footer

Van Der Vaart Wellner Extended Continuous Mapping Theorem

Theory of probability

In the theory of probability, the Glivenko–Cantelli theorem (sometimes referred to as the Fundamental Theorem of Statistics), named after Valery Ivanovich Glivenko and Francesco Paolo Cantelli, determines the asymptotic behaviour of the empirical distribution function as the number of independent and identically distributed observations grows.[1]


The uniform convergence of more general empirical measures becomes an important property of the Glivenko–Cantelli classes of functions or sets.[2] The Glivenko–Cantelli classes arise in Vapnik–Chervonenkis theory, with applications to machine learning. Applications can be found in econometrics making use of M-estimators.

Statement [edit]

Assume that X 1 , X 2 , {\displaystyle X_{1},X_{2},\dots } are independent and identically-distributed random variables in R {\displaystyle \mathbb {R} } with common cumulative distribution function F ( x ) {\displaystyle F(x)} . The empirical distribution function for X 1 , , X n {\displaystyle X_{1},\dots ,X_{n}} is defined by

F n ( x ) = 1 n i = 1 n I [ X i , ) ( x ) = 1 n | { 1 i n | X i x } | {\displaystyle F_{n}(x)={\frac {1}{n}}\sum _{i=1}^{n}I_{[X_{i},\infty )}(x)={\frac {1}{n}}\left|\left\{1\leq i\leq n|X_{i}\leq x\right\}\right|}

where I C {\displaystyle I_{C}} is the indicator function of the set C {\displaystyle C} . For every (fixed) x {\displaystyle x} , F n ( x ) {\displaystyle F_{n}(x)} is a sequence of random variables which converge to F ( x ) {\displaystyle F(x)} almost surely by the strong law of large numbers. Glivenko and Cantelli strengthened this result by proving uniform convergence of F n {\displaystyle F_{n}} to F {\displaystyle F} .

Theorem

F n F = sup x R | F n ( x ) F ( x ) | 0 {\displaystyle \|F_{n}-F\|_{\infty }=\sup _{x\in \mathbb {R} }|F_{n}(x)-F(x)|\longrightarrow 0} almost surely.[3]

This theorem originates with Valery Glivenko[4] and Francesco Cantelli,[5] in 1933.

Remarks

Proof [edit]

For simplicity, consider a case of continuous random variable X {\displaystyle X} . Fix = x 0 < x 1 < < x m 1 < x m = {\displaystyle -\infty =x_{0}<x_{1}<\cdots <x_{m-1}<x_{m}=\infty } such that F ( x j ) F ( x j 1 ) = 1 m {\displaystyle F(x_{j})-F(x_{j-1})={\frac {1}{m}}} for j = 1 , , m {\displaystyle j=1,\dots ,m} . Now for all x R {\displaystyle x\in \mathbb {R} } there exists j { 1 , , m } {\displaystyle j\in \{1,\dots ,m\}} such that x [ x j 1 , x j ] {\displaystyle x\in [x_{j-1},x_{j}]} . Note that

F n ( x ) F ( x ) F n ( x j ) F ( x j 1 ) = F n ( x j ) F ( x j ) + 1 / m , F n ( x ) F ( x ) F n ( x j 1 ) F ( x j ) = F n ( x j 1 ) F ( x j 1 ) 1 / m . {\displaystyle {\begin{aligned}F_{n}(x)-F(x)&\leq F_{n}(x_{j})-F(x_{j-1})=F_{n}(x_{j})-F(x_{j})+1/m,\\F_{n}(x)-F(x)&\geq F_{n}(x_{j-1})-F(x_{j})=F_{n}(x_{j-1})-F(x_{j-1})-1/m.\end{aligned}}}

Therefore,

| | F n F | | = sup x R | F n ( x ) F ( x ) | max j { 1 , , m } | F n ( x j ) F ( x j ) | + 1 / m . {\displaystyle ||F_{n}-F||_{\infty }=\sup _{x\in \mathbb {R} }|F_{n}(x)-F(x)|\leq \max _{j\in \{1,\dots ,m\}}|F_{n}(x_{j})-F(x_{j})|+1/m.}

Since max j { 1 , , m } | F n ( x j ) F ( x j ) | 0  a.s. {\textstyle \max _{j\in \{1,\dots ,m\}}|F_{n}(x_{j})-F(x_{j})|\to 0{\text{ a.s.}}} by strong law of large numbers, we can guarantee that for any positive ε {\textstyle \varepsilon } and any integer m {\textstyle m} such that 1 / m < ε {\textstyle 1/m<\varepsilon } , we can find N {\textstyle N} such that for all n N {\displaystyle n\geq N} , we have max j { 1 , , m } | F n ( x j ) F ( x j ) | ε 1 / m  a.s. {\textstyle \max _{j\in \{1,\dots ,m\}}|F_{n}(x_{j})-F(x_{j})|\leq \varepsilon -1/m{\text{ a.s.}}} . Combined with the above result, this further implies that | | F n F | | ε  a.s. {\textstyle ||F_{n}-F||_{\infty }\leq \varepsilon {\text{ a.s.}}} , which is the definition of almost sure convergence.

Empirical measures [edit]

One can generalize the empirical distribution function by replacing the set ( , x ] {\displaystyle (-\infty ,x]} by an arbitrary set C from a class of sets C {\displaystyle {\mathcal {C}}} to obtain an empirical measure indexed by sets C C . {\displaystyle C\in {\mathcal {C}}.}

P n ( C ) = 1 n i = 1 n I C ( X i ) , C C {\displaystyle P_{n}(C)={\frac {1}{n}}\sum _{i=1}^{n}I_{C}(X_{i}),C\in {\mathcal {C}}}

Where I C ( x ) {\displaystyle I_{C}(x)} is the indicator function of each set C {\displaystyle C} .

Further generalization is the map induced by P n {\displaystyle P_{n}} on measurable real-valued functions f, which is given by

f P n f = S f d P n = 1 n i = 1 n f ( X i ) , f F . {\displaystyle f\mapsto P_{n}f=\int _{S}f\,dP_{n}={\frac {1}{n}}\sum _{i=1}^{n}f(X_{i}),f\in {\mathcal {F}}.}

Then it becomes an important property of these classes whether the strong law of large numbers holds uniformly on F {\displaystyle {\mathcal {F}}} or C {\displaystyle {\mathcal {C}}} .

Glivenko–Cantelli class [edit]

Consider a set S {\displaystyle {\mathcal {S}}} with a sigma algebra of Borel subsets A and a probability measure P. For a class of subsets,

C { C : C  is measurable subset of S } {\displaystyle {\mathcal {C}}\subset \{C:C{\mbox{ is measurable subset of }}{\mathcal {S}}\}}

and a class of functions

F { f : S R , f  is measurable } {\displaystyle {\mathcal {F}}\subset \{f:{\mathcal {S}}\to \mathbb {R} ,f{\mbox{ is measurable}}\,\}}

define random variables

P n P C = sup C C | P n ( C ) P ( C ) | {\displaystyle \|P_{n}-P\|_{\mathcal {C}}=\sup _{C\in {\mathcal {C}}}|P_{n}(C)-P(C)|}
P n P F = sup f F | P n f P f | {\displaystyle \|P_{n}-P\|_{\mathcal {F}}=\sup _{f\in {\mathcal {F}}}|P_{n}f-Pf|}

where P n ( C ) {\displaystyle P_{n}(C)} is the empirical measure, P n f {\displaystyle P_{n}f} is the corresponding map, and

E f = S f d P = P f {\displaystyle \mathbb {E} f=\int _{\mathcal {S}}f\,dP=Pf} , assuming that it exists.

Definitions

  • A class C {\displaystyle {\mathcal {C}}} is called a Glivenko–Cantelli class (or GC class) with respect to a probability measure P if any of the following equivalent statements is true.
1. P n P C 0 {\displaystyle \|P_{n}-P\|_{\mathcal {C}}\to 0} almost surely as n {\displaystyle n\to \infty } .
2. P n P C 0 {\displaystyle \|P_{n}-P\|_{\mathcal {C}}\to 0} in probability as n {\displaystyle n\to \infty } .
3. E P n P C 0 {\displaystyle \mathbb {E} \|P_{n}-P\|_{\mathcal {C}}\to 0} , as n {\displaystyle n\to \infty } (convergence in mean).
The Glivenko–Cantelli classes of functions are defined similarly.
  • A class is called a universal Glivenko–Cantelli class if it is a GC class with respect to any probability measure P on (S,A).
  • A class is called uniformly Glivenko–Cantelli if the convergence occurs uniformly over all probability measures P on (S,A):
sup P P ( S , A ) E P n P C 0 ; {\displaystyle \sup _{P\in {\mathcal {P}}(S,A)}\mathbb {E} \|P_{n}-P\|_{\mathcal {C}}\to 0;}
sup P P ( S , A ) E P n P F 0. {\displaystyle \sup _{P\in {\mathcal {P}}(S,A)}\mathbb {E} \|P_{n}-P\|_{\mathcal {F}}\to 0.}

Theorem (Vapnik and Chervonenkis, 1968)[7]

A class of sets C {\displaystyle {\mathcal {C}}} is uniformly GC if and only if it is a Vapnik–Chervonenkis class.

Examples [edit]

sup P P ( S , A ) P n P C n 1 / 2 {\displaystyle \sup _{P\in {\mathcal {P}}(S,A)}\|P_{n}-P\|_{\mathcal {C}}\sim n^{-1/2}} , that is C {\displaystyle {\mathcal {C}}} is uniformly Glivenko–Cantelli class.

See also [edit]

  • Donsker's theorem
  • Dvoretzky–Kiefer–Wolfowitz inequality – strengthens the Glivenko–Cantelli theorem by quantifying the rate of convergence.

References [edit]

  1. ^ Howard G.Tucker (1959). "A Generalization of the Glivenko-Cantelli Theorem". The Annals of Mathematical Statistics. 30 (3): 828–830. doi:10.1214/aoms/1177706212. JSTOR 2237422.
  2. ^ van der Vaart, A. W. (1998). Asymptotic Statistics . Cambridge University Press. p. 279. ISBN978-0-521-78450-4.
  3. ^ van der Vaart, A. W. (1998). Asymptotic Statistics . Cambridge University Press. p. 265. ISBN978-0-521-78450-4.
  4. ^ Glivenko, V. (1933). Sulla determinazione empirica delle leggi di probabilità. Giorn. Ist. Ital. Attuari 4, 92-99.
  5. ^ Cantelli, F. P. (1933). Sulla determinazione empirica delle leggi di probabilità. Giorn. Ist. Ital. Attuari 4, 421-424.
  6. ^ van der Vaart, A. W. (1998). Asymptotic Statistics . Cambridge University Press. p. 268. ISBN978-0-521-78450-4.
  7. ^ Vapnik, V. N.; Chervonenkis, A. Ya (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264–280. doi:10.1137/1116025.

Further reading [edit]

  • Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge University Press. ISBN0-521-46102-2.
  • Pitman, E. J. G. (1979). "The Sample Distribution Function". Some Basic Theory for Statistical Inference. London: Chapman and Hall. p. 79–97. ISBN0-470-26554-X.
  • Shorack, G. R.; Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley. ISBN0-471-86725-X.
  • van der Vaart, A. W.; Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer. ISBN0-387-94640-3.
  • van der Vaart, Aad W.; Wellner, Jon A. (1996). Glivenko-Cantelli Theorems. Springer.
  • van der Vaart, Aad W.; Wellner, Jon A. (2000). Preservation Theorems for Glivenko-Cantelli and Uniform Glivenko-Cantelli Classes. Springer.

gomestheationd.blogspot.com

Source: https://en.wikipedia.org/wiki/Glivenko%E2%80%93Cantelli_theorem

Post a Comment for "Van Der Vaart Wellner Extended Continuous Mapping Theorem"