1. 目录
2. 相关先修内容 (待补充)
    2.1. 概率
    2.2. 大数定律 Law of Large Numbers
    2.3. 中心极限定律 Central Limit Theorem
    2.4. 其他 (待更)
    2.5. 补充
3. 各种概率分布 (待补充)
    3.1. 离散型分布
    3.2. 连续型分布
4. 抽样方法
    4.1. The Inversion Method
    4.2. The Grid Method
    4.3. The Rejection Method
    4.4. The Sampling/Importance Resampling Method
    4.5. The Stochastic Representation Method
    4.6. The Conditional Sampling Method
5. 统计中的优化算法 (待更)
    5.1. 基础
    5.2. The Newton-Raphson Algorithm
    5.3. The Fisher Scoring Algorithm
    5.4. The EM Algorithm
    5.5. The ECM Algorithm
    5.6. The MM Algorithms
6. 经典蒙特卡洛积分与黎曼和估计 (待更)
    6.1. Classical Monte Carlo Integration
    6.2. The Riemannian Sum Estimator
7. 贝叶斯统计 Bayesian Statistics (待更)
8. 随机过程 (待更)
    8.1. 马尔可夫链
    8.2. 平稳过程
8. 蒙特卡洛方法 MCMC Methods (待更)
9. Bootstrap Methods (待补充)
    9.1. Parametric Bootstrap
    9.2. Non-Parametric Bootstrap
    9.3. Hypothesis Testing with the Bootstrapj

各种概率分布

离散型分布

均匀分布 X ~ U(a, b)

描述了有限个数值拥有相同概率的概率分布。

支撑集 $k \in \{ a, a + 1, a + 2, \ldots, b - 1, b\}$

概率质量函数

$p( X = k ) = \left\{ \begin{matrix} \frac{1}{n} & \textrm{ for a < k < b} \\ 0 & \textrm{ otherwise} \\ \end{matrix} \right. \qquad n = b - a + 1$

累计分布函数

$F( k ) = \left\{ \begin{matrix} 0 & \textrm{ k < a} \\ \frac{k - a + 1}{n} & \textrm{ $a \leq k < b$} \\ 1 & \textrm{ $b \leq k$} \\ \end{matrix} \right.$

数学期望

$E( X ) = \frac{a + b}{2}$

方差

$\text{var}( X ) = \frac{n^{2} - 1}{12}$

伯努利/两点分布 X ~ Bernoulli§

描述了若伯努利试验成功，则伯努利随机变量取值为1。若伯努利试验失败，则伯努利随机变量取值为0的概率分布。

支撑集 $k \in \{ 0,\ 1\}$

概率质量函数

$p( X = k ) = p^{k}{(1 - p)}^{1 - k} = \left\{ \begin{matrix} 1 - p & \textrm{ k = 0 }\\ p & \textrm{ k = 1 }\\ \end{matrix} \right.$

累计分布函数

$F( k ) = \left\{ \begin{matrix} 0 & \textrm{ k < 1 }\\ 1 - p & \textrm{ $0 \leq k < 1$} \\ 1 & \textrm{ $1 \leq k$} \\ \end{matrix} \right.$

数学期望

$E( X ) = p$

方差

$\text{var}( X ) = p(1 - p)$

二项分布 X ~ B(n, p)

描述了n次试验中正好得到k次成功的概率分布。

支撑集 $k \in \{ 0,\ 1,\ \ldots,\ n\}$

概率质量函数

$p( X = k;n,p ) = {n\choose k} p^k (1 - p)^{n - k}$

${n\choose k} = \frac{n!}{k!( n - k )!}$

累计分布函数

$F( k;n,p ) = \sum_{i = 0}^{k}{ {n\choose i} p^i(1 - p)^{n - i}}$

数学期望

$E( X ) = np$

方差

$\text{var}( X ) = np(1 - p)$

多项式分布 X ~ Multinomial (n, p1, p2, …, pk)

是二项分布的一般化。

支撑集
$x_{i} \in \left\{ 0,\ 1,\ \ldots,\ n \right\}\quad\sum_{}^{}x_{i} = n\quad i \in \{ 1,\ 2,\ \ldots,\ k\}$

概率质量函数

$p( \vec{X} = {\{ x_1, x_2, \ldots, x_k\}}^T;n,p ) = \left\{ \begin{matrix} \frac{n!}{x_1 !\ldots x_k !} p^{x_1}\cdots p^{x_k} & \textrm{ $when\sum_{}^{}x_{i} = n$} \\ 0 & \textrm{ otherwise }\\ \end{matrix} \right.$

累计分布函数比较复杂。

数学期望

$E( X_{i} ) = np_{i}$

方差/协方差

$\text{var}( X_{i} ) = np_{i}(1 - p_{i})$

$\text{Cov}( X_{i},\ X_{j} ) = - np_{i}p_{j}\quad i \neq j$

泊松分布 X ~ P(λ)

在二项分布的伯努利试验中，如果试验次数n很大，二项分布的概率p很小，且乘积λ=
np比较适中，则事件出现的次数的概率可以用泊松分布来逼近。

支撑集 $k \in \{ 0,\ 1,\ \ldots\}$

概率质量函数

$p( X = k ) = \frac{\lambda^k}{k!}e^{- \lambda}$

累计分布函数

$F( k;\lambda ) = e^{- \lambda}\sum_{i = 0}^{k}\frac{\lambda^i}{i!} = \frac{\Gamma( k + 1,\lambda )}{k!}$

$\Gamma( s,x ) = \int_{x}^{\infty}{t^{s - 1}e^{- t}dt}$

数学期望

$E( X ) = \lambda$

方差

$\text{var}( X ) = \lambda$

几何分布 X ~ G§

描述了在伯努利试验中，得到一次成功所需要的试验次数或失败的次数。前一种形式经常被称作shifted
geometric distribution，如下：

支撑集 $k \in \{ 1,\ 2,\ \ldots\}$

概率质量函数

$p( X = k;p ) = p(1 - p)^{k - 1}$

累计分布函数

$F( k;p ) = 1 - (1 - p)^k$

数学期望

$E( X ) = 1/p$

方差

$\text{var}( X ) = (1 - p)/p^2$

超几何分布 X ~ H(n, K, N)

描述了由含有K个指定种类物件的有限N个物件中抽出n个物件，成功抽出该指定种类的物件的个数k（不归还
（without replacement））。

支撑集 $k \in \{ 1, 2, \ldots\}$

概率质量函数

$p(X=k;n,K,N)=\frac{K\choose k }{N\choose n} {N-K\choose n-k}$

累计分布函数比较复杂

数学期望

$E( X ) = nK/N$

方差比较复杂

连续型分布

均匀分布 X ~ U(a, b)

支撑集 $a \leq x \leq b$

概率密度函数

$f( x ) = \left\{ \begin{matrix} \frac{1}{b - a} & \textrm{$ for\ a \leq x \leq b$} \\ 0 & \textrm{ otherwise} \\ \end{matrix} \right.$

累计分布函数

$F( x ) = Pr(X < x) = \left\{ \begin{matrix} 0 & \textrm{ k < a} \\ \frac{x - a}{b - a} & \textrm{$ a \leq k < b$} \\ 1 & \textrm{$ b \leq k$} \\ \end{matrix} \right.$

数学期望

$E(X) = \frac{a + b}{2}$

方差

$D(X) = \frac{(b - a)^2}{12}$

指数分布 X ~ Exp(λ)

指数分布可以用来表示独立随机事件发生的时间间隔。

支撑集 $x \in \lbrack 0,\ + \infty)$

概率密度函数

$f( x ) = \lambda e^{- \lambda x}\quad \text{for $x \geq 0$}$

累计分布函数

$F( x ) = Pr(X < x) = \left\{ \begin{matrix} 0 & \textrm{ x < 0} \\ 1 - e^{- \lambda x} & \textrm{$ 0 \leq x$} \\ \end{matrix} \right.$

数学期望

$E(X) = \frac{1}{\lambda}$

方差

$D(X) = \frac{1}{\lambda^2}$

正态分布 X ~ B(μ, σ²)

正态分布在统计学上十分重要，经常用在自然和社会科学来代表一个不明的随机变量。

支撑集 $x \in ( - \infty, + \infty)$

概率密度函数

$f( x ) = \frac{1}{\sigma\sqrt{2\pi}}e^{\frac{(x - \mu)^2}{- 2\sigma^2}}$

累计分布函数

$F( x ) = Pr(X < x) = \frac{1}{2}\lbrack 1 + \text{erf}( \frac{x - \mu}{\sigma\sqrt{2}} )\rbrack$

$\text{erf}( x ) = \frac{1}{\sqrt{\pi}}\int_{- x}^{x}{e^{- t^{2}}dt}$

数学期望

$E(X) = \mu$

方差

$D(X) = \sigma^{2}$

伽马分布 X ~ Γ(α, β)

伽玛分布是统计学的一种连续概率函数。伽玛分布中的参数α，称为形状参数，β称为尺度参数，假设随机变数X为
等到第α件事发生所需之等候时间。

支撑集 $x \in \lbrack 0, + \infty)$

概率密度函数，取 $\lambda = 1/\beta$

$f( x ) = \frac{x^{\alpha - 1}\lambda^{\alpha}e^{- \lambda x}}{\Gamma( \alpha )}\quad x > 0$

Gamma函数特征

$\left\{ \begin{matrix} Γ( \alpha ) = ( \alpha - 1 )! & \textrm{$ if\ \alpha\ is \mathbb{Z}^{+} $} \\ Γ( \alpha ) = ( \alpha - 1 )!\Gamma( \alpha - 1 ) & \textrm{$ if\ α\ is\ \mathbb{R}^{+} $} \\ \Gamma( 1/2 ) = \sqrt{\pi} & \textrm{$ \Gamma( 1 ) = 1 $} \\ \end{matrix} \right.$

累计分布函数比较复杂

数学期望

$E( X ) = αβ$

方差

$D( X ) = \beta^{2}\alpha$

贝塔分布 X ~ Be(α, β)

指一组定义在(0,1)区间的连续概率分布，有两个参数α，β>0。

支撑集 $x \in (0, 1)$

概率密度函数

$f( x ) = \frac{x^{\alpha - 1}{(1 - x)}^{\beta - 1}}{Β( \alpha,\beta )}\quad x > 0$

$Β( \alpha,\beta ) = \int_{0}^{1}{u^{\alpha - 1}{(1 - u)}^{\beta - 1}}du$

Beta函数特征

$Β( \alpha,\beta ) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha + \beta)}$

累计分布函数比较复杂

数学期望

$E( X ) = \frac{\alpha}{\alpha + \beta}$

方差

$D( X ) = \frac{αβ}{( \alpha + \beta )^{2}(\alpha + \beta + 1)}$

其他分布

截断正态分布 X ~ TN(n, p)

Dirichlet分布 X

抽样方法

The Inversion Method

离散型

假设对于一个随机变量 X 有概率质量函数 pmf

$p(X=x_i)=p_i\quad \textrm{where $p_i>0, i=1, 2,\ldots, m$ and $\sum_{i=1}^{m}{p_i=1}$}$

其中 m 可以是 finite or infinite.

那么从 U(0, 1)中均匀抽样，得到 Y 与 X 的概率累积分布比较，落在哪个区间即对应哪个值。X的概率取值服从均匀分布。

**算法过程**

Step1. Draw Y from U(0,1);

Step2. If Y < p1, set X = x1 and stop;

Else if Y < p1+p2, set X = x2 and stop;

…

Else if Y < ∑pi, set X = xi and stop;

…

**Note.**

通常将较大 pi 的抽样放在靠近 0 的一侧，会更高效。
不一定要将所有 pi 都先计算出来。比如用 Inversion Method 生成泊松分布样本时，先执行 Step1，然后依次计算出 pi，再求和比较。如果符合条件，就不必计算后续的 pi 了。

连续型

假设对于一个随机变量 X 有概率累积分布函数 cdf

$F(x)=Pr(X<x)=\int_{-\infty}^x f(t)dt$

那么令 Y = y ~ U(0, 1)。从 y = F(x)求解 x = F-1 (y)。F(x)是服从(0, 1)上的均匀分布的。

**算法过程**

Step1. Get the cdf F(x) of X;

Step2. Let x = F-1(y);

Step3. Draw Y from U(0, 1);

Step4. Return X = F-1(Y).

**Note.**

证明随机变量 X 有连续的 cdf, 则随机变量 Y=F(X)~U(0, 1) (这里的 F(·)被视作一个函数)。只需证 F(y)=y

$F(y)=Pr(Y \leq y)=Pr(F(X)\leq y)$

对于 Y→X 的映射，有

$x=F^{-1}(y)=\text{inf}\{x:F(x)\geq y\}\quad \textrm{where $y\in (0,1)$}$

易知 $F(F^{-1}(y))\equiv F(x)\equiv y$

$Pr(F(X)\leq y)=Pr(X\leq x)=F(x)=F(F^{-1}(y))=y$

得证。

The Grid Method

连续型 Only

假设对于一个随机变量 X 有概率密度函数 pdf

$X \sim f_X(x)\quad \text{where $S_X$ is finite}$

那么在支撑集 S_X 上选取合适的格点集合 {xi} 覆盖 S_X 。计算该格点集合对应的概率密度函数值集合 {fx(xi)} 并作归一化

$p_i=\frac{f_X(x_i)}{\sum_{i=1}^d{f_X(x_i)}}\quad i=1, 2, \ldots,d$

就得到了随机变量 X 的近似 X’

$X' \sim \text{FDiscrete}_d(\{x_i\},\{p_i\})$

**算法过程**

Step1. Generate an appropriate set {xi}, which covers $S_X$ ;

Step2. Get the {pi};

Step3. Sample from X’ ~ FDiscrete({xi},{pi}).

**Note.**

主要是处理 IM 连续型计算复杂的情况，将其简化至离散型，然后再用 IM 或其他方法抽
样。

The Rejection Method (The AR Method(Acceptance-Rejection))

连续型/离散型

假设对于一个随机变量 X 有概率密度函数 pdf

$X \sim f_X(x)\quad \text{where $S_X$ is finite}$

且有常数 c≥1 和包络密度函数 g(x)，满足

$f(x)\leq cg(x)\quad \forall x\in S_X$

那么由必须满足的条件：

$f(x)\leq cg(x)\quad \forall x\in S_X$

据此可以得到

$f_Y(x|Z\leq \frac{f(Y)}{cg(Y)})=\frac{Pr(Z\leq \frac{f(Y)}{cg(Y)}|Y=x)\cdot g(x)}{Pr(Z\leq \frac{f(Y)}{cg(Y)})}\\ =\frac{Pr(Z\leq \frac{f(Y)}{cg(Y)})}{\int_{S_X}{Pr(Z\leq \frac{f(Y)}{cg(Y)}|Y=x)}g(x)dx} g(x)\\ =\frac{\frac{f(x)}{cg(x)} g(x)}{\int_{S_X}{\frac{f(x)}{cg(x)}g(x)dx}}=\frac{\frac{f(x)}{c}}{1/c}=f(x)$

由上式可知，仅当

$f(x)\leq cg(x)\quad \forall x\in S_X$

满足时，fY(x|Z≤f(Y)/cg(Y))=f(x)才成立，否则 fY(x|Z≤f(Y)/cg(Y))=g(x)。所以很自然地,

$c=\{ c_{opt}|c_{opt}\geq \frac{f(x)}{g(x)}\quad \forall x\in S_X \}$

由于

$Pr(Z\leq \frac{f(Y)}{cg(Y)})=\int_{S_X} Pr(Z\leq \frac{f(Y)}{cg(Y)}\big|Y=x)g(x)dx\\ =\int_{S_X} \frac{f(x)}{cg(x)}g(x)dx=1/c$

代表了抽样有效的效率，所以 c 应当尽可能小。故对于一个满足要求的函数簇 $g_{\theta}(x)$ ，取

$c=min\{ max_\theta (\frac{f(x)}{g_{\theta}(x)}) \}$

**算法过程**

Step1. Draw Z ~ U(0, 1) and independently draw Y ~ g(·) ;

Step2. If Z ≤ f(Y) / [cg(Y)], return X = Y, otherwise, go to Step 1.

**Note.**

求包络参数 c 第一步取最大值是要满足充分条件 c≥f(x)/g(x)。第二步取最小值是对于一簇满足要求的 g(x)，要让接受概率最小。
接受概率为 1/c。
求 f(x)/g(x)最大值时可以改用求 logf(x)-logg(x)最大值。
g(x)的条件有：1）和 f(x)一样的支撑集；2）比 f(x)更大的变化[variance/dispersion]；3）从 g(x)中抽样更简单。
log-concave 函数适合用 piece-wise exponential envelopes。（log-concave 就是 log 后是 concave 函数，可以用分段指数函数来作为包络）。

The Sampling/Importance Resampling Method (The SIR Method)

连续型 only

假设对于一个随机变量 X 有概率密度函数 pdf

$X \sim f_X(x)\quad \text{where $S_X$ is finite}$

以及一个相同支撑集上的概率密度函数pdf

$X \sim g(x)$

那么已知

$f(x)=\frac{f(x)}{g(x)} g(x) \quad \forall x \in S_X$

则么可以将从 f(x)的抽样过程看成两个过程：Y~g(x)和 X~f(Y)/g(Y)。若要令抽样的 X 概率分布完全等同于 f(x)，首先应当抽出一个数量无穷大的{Y}，再由f(y)/g(y)抽出 X。这个问题反而变复杂了。所以引入误差，使得

$w(X^j)=\frac{f(X^j)}{X^j}$

$\omega_j(X^j)=\frac{w(X^j)}{\sum{w(X^j)}}$

相当于使用了 The Grid Method。再进行重抽样抽出 m 个样本。
关于近似的程度

$Pr(X^*\leq x^*) = \sum_{j=1}^J{\omega_j \times I(X^j\leq x^*)}\\ =\frac{\sum_{j=1}^J {[w(X^j)\times I(X^j\leq x^*)]}}{\sum{w(X^j)}}\\ =\frac{\frac{1}{J}\sum_{j=1}^J {[w(X^j)\times I(X^j\leq x^*)]}}{\frac{1}{J} \sum{w(X^j)}}$

其中 $I(x)$ 是指示函数，且有

$\text{mean}(w(X^j))=\frac{\int {w(x)\times I(x\leq x^*)\times g(x) dx}}{\int w(x)g(x)dx}$

故

$\lim_{J\to \infty} Pr(X^*\leq x^*)=\frac{\int w(x)\times I(x\leq x^*)\times g(x)dx}{\int w(x)g(x)dx}\\ =\int_{-\infty}^{x^*} f(x)dx$

**算法过程**

Step1. Generate X1, X2, … , XJ _iid g(·) [Sampling step];

Step2. Select a subset $\{X^i\}_{i=1}^m$ from $\{X^j\}_{j=1}^J$ via resampling without replacement from the discrete distribution on {Xj} with probabilities {ωj} [Importanceresampling step].

**Note.**

J/m → ∞时即是完全等同，一般取 J/m≥10 或 J/m=20 比较有效。
SIR 得到的是原分布的近似，重抽样过程采用了类似 The Grid Method 类似的思想。

The Stochastic Representation Method (The SR Method)

连续型/离散型

假设对于一个随机变量 X 有概率密度函数 pdf

$X \sim f_X(x)\quad \text{where $S_X$ is finite}$

那么如果随机变量 X 和 Y 有相同的分布，则记作𝑋 ⇔ 𝑌 （⇔上有个"d"，下文相同），其中，相同分布的意思是

$F(x)=Pr(X\leq x) \Leftrightarrow Pr(Y\leq y)=F(y)\quad \text{let $x=y$}$

比如某个 X 的累积分布 F(x)关于(0, 0.5)对称，那么取 Y=-X，有

$F(y) =Pr(Y\leq y)=Pr(-X\leq x)\\ =Pr(X\geq -x)=1-Pr(X\leq -x)\\ =1-F(-x)=F(x)$

所以Y=-X ⇔ X。若𝑋 ⇔ 𝑌且有𝑋 = ℎ(𝑌)，则有 $f_Y(y)dy=f_X(x)dx$

$f_Y(y)=f_X(x)\Big|\frac{dx}{dy}\Big|=f_X(h(y))\Big|\frac{dh(x)}{dy}\Big|=f_X(h(y))|\Delta h(y)|$

同理，一个y对应多个x时， $X=h_i(Y)$

$f_Y(y)=\sum f_X(h_i(y))|\Delta h_i(y)|$

**算法过程**

这里是一对二的情况。

Step1. Draw Z ∼ U(0, 1) and independently draw Y ∼ fY (·);

Step2. Set X1 = h1(Y) and X2 = h2(Y);

Step3. If $Z\leq \{1+\frac{f_X(X_2)}{f_X(X_1)}\Big|\frac{\Delta G(X_1)}{\Delta G(X_2)}\Big|\}$ , return X = X1, else return X = X2.

**Note.**

逆方法可以看作 SR 方法的特例：没有 Step3；
beta 分布可以通过 gamma 分布生成;
inverse gamma 通过 gamma;
chi-squared and log-normal via normal;
Student’s t- and F-distribution via normal and chi-squared；
Dirichlet distribution 可以通过 independent gamma distributions;
multivariate normal distribution 通过 uni-normal；
multivariate t-distribution 通过 multinormal and chi-squared;
Wishart 通过 multinormal。

The Conditional Sampling Method (The CS Method)

连续型/离散型

假设对于一个随机变量 $\vec X=(X_1,\ldots,X_d)^T$ 有概率密度函数 pdf

$\vec X \sim f_X(\vec x)=f_X(x_1,x_2,\ldots,x_d)\quad \text{where $S_X$ is finite}$

该式可以写作 $f_X(x_1,x_2,\ldots,x_d)=f_1(x_1)\prod_{i=2}^d f_i(x_i|x_1,x_2,\ldots,x_{i-1})$ 并求解。

**算法过程**

Step1. Draw X1 from f1(x1);

Step2. Draw X2 from f2(x2|x1);

Step3. Draw X3 from f3(x3|x1, x2);

…

Stepi. Draw Xd from fd(xd|x1, x2, …, xd-1).

**Note.**

暂无。

统计中的优化算法

基础

The Newton-Raphson Algorithm

The Fisher Scoring Algorithm

The EM Algorithm

The ECM Algorithm

The MM Algorithms

经典蒙特卡洛积分与黎曼和估计

Classical Monte Carlo Integration

The Riemannian Sum Estimator

贝叶斯统计 Bayesian Statistics

待更

随机过程

马尔可夫链

平稳过程

蒙特卡洛方法 MCMC Methods

待更

Bootstrap Methods

**What's the bootstrap?**

Bootstrap 是统计推断中基于数据计算的方法。

The bootstrap is a data-based method for statistical inference. Its introduction into statistics is relatively recent because the method is computationally intensive.

**The purpose of the bootstrap.**

Bootstrap提供了一种通用方法来获得估计量的标准误差估计（ ${\widehat {Se}(\hat \theta)}$ ）以及参数的置信区间CI。
Bootstrap同样应用于假设检验中。

First, the bootstrap approach provides a general method for obtaining estimated standard errors ${\widehat {Se}(\hat \theta)}$ of estimators and confidence intervals $CI$ of parameters.

Second, the bootstrap approach can also be applied to testing hypotheses to calculate a bootstrap p-value or to provide an upper quantile point of the distribution of a test-statistic when its density is not available in closed-form.

Parametric Bootstrap

**How to describe the parametric bootstrap for computing CIs of parameters of interest by one sentence?**

Suppose that we have a method to calculate the point estimator $\hat \theta$ , repeatedly computing the point estimator G times based on G bootstrap samples will result in the CI of θ.
Thus, the key is how to generate a bootstrap sample.

An example: Large-sample CIs for one-sample problem

Let ${\{Xi\}}_{i=1}^{n}$ i.i.d. Bernoulli(θ), where $\theta = Pr(X_1 =1)$ is unknown parameter of population mean.

Let $\vec{X}=(X_1,\cdots,X_n)^T$ and $\vec{x}=(x_1,\cdots,x_n)^T$ . The likelihood function for θ is:

$L(\theta)=\prod_{i=1}^n{\theta^{x_i}(1-\theta)^{1-x_i}},\qquad 0\leq\theta\leq1$

so that the MLE of θ is the sample mean defined by:

$\hat\theta=\frac{1}{n}\sum_{i=1}^n{x_i=s(\vec{x})}$

Note that

$n\hat\theta=\sum_{i=1}^n{X_i}\sim{Binomial(n,\theta)}$

then

$E(\hat\theta)=\theta,\qquad and \qquad Var(\hat\theta)=\theta(1-\theta)/n$

According to Central Limit Theorem, we have

$\frac{\hat\theta-\theta}{\sqrt{\theta(1-\theta)/n}}\to Z \sim N(0,1)$

Namely, $[\hat\theta-E(\hat\theta)]/[Var(\hat\theta)]^\frac{1}{2}$ coverges in distribution to a random variable following N(0,1). Based on limiting properties of MLE, we have approximately

$\frac{\hat\theta-\theta}{\sqrt{\theta(1-\theta)/n}} \thicksim N(0,1)\qquad as \qquad n \to \infty$

Let $z_\alpha$ denote the upper $\alpha$ -th quantile of N(0,1) satisfying $Pr(Z \geq z_\alpha)=\alpha$ . Therefore, an asymptotic $100(1-\alpha)\%$ confidence interval (CI) of θ is given by:

$1-\alpha=Pr(-z_{\alpha/2} \leq \frac{\hat\theta-\theta}{\hat\sigma} \leq z_{\alpha/2})=Pr(\hat\theta-z_{\alpha/2}\hat\sigma \leq \theta \leq \hat\theta + z_{\alpha/2}\hat\sigma)$

Thus, $[\hat\theta_l, \hat\theta_u]=[\hat\theta-z_{\alpha/2}\hat\sigma,\hat\theta+z_{\alpha/2}\hat\sigma]$ , where $\hat\sigma=\sqrt{\hat\theta(1-\hat\theta)/n}$

**Two problems with the asymptotic CI**

即便对于大容量样本，当θ真实值接近0时，CI下界会低于0；当θ真实值接近1时，CI下界会高于1。这是无效的。
对于小到中等容量样本，asymptotic CI不够可靠。

First, even though for large sample size n, the lower bound may be beyond zero when the true value of θ is close to zero while the upper bound may be beyond 1 when the true value of θ is near to 1.

Second, for small to moderate sample sizes, the asymptotic CI is not reliable.

Parametric Bootstrap World

假设有分布 $x \sim f(x;\theta)$

**算法过程**

Step1. 计算参数的点估计(point estimator) $\hat\theta=s(\vec x)$ ，比如采用MLE。

Step2. 生成独立同分布bootstrap sample $\vec X^*=(x_1^*,\cdots,x_n^*)^T$ with $\{x_i^*\}_{i=1}^n\sim f(x;\hat\theta)$ 并且计算对应的bootstrap replication $\hat\theta^*=s(\vec x^*)$ .

Step3. 独立地重复G次 Step2 ，获得G bootstrap replication $\{\hat\theta^*(g)\}_{g=1}^G$ .

Step4. 与此同时，标准误差 ${Se}(\hat \theta)$ 可以通过the G replication来估计，如

$\widehat {Se}^*(\hat \theta)=\sqrt{\frac{1}{G-1}\sum_{g=1}^G[\hat\theta^*(g)-\overline \theta^*]^2} \\ where \qquad \overline\theta^*=[\hat\theta^*(1)+\cdots+\hat\theta^*(G)]/G$

Step5. 如果 $\{\hat\theta^*(g)\}_{g=1}^G$ 近似正态分布，则 $100(1-\alpha)\%$ bootstrap CI of $\theta$ 为
$[\hat\theta_l^*, \hat\theta_u^*]=[\overline\theta^*-z_{\alpha/2}\widehat {Se}^*(\hat \theta),\overline\theta^*+z_{\alpha/2}\widehat {Se}^*(\hat \theta)]$

Note: in standard normal distribution, $z_{\alpha/2} \approx 1.96$ .

Step6. 如果远非正态分布或Step5 中的CI是无效的，则 $100(1-\alpha)\%$ bootstrap CI of $\theta$ 为 $[\hat\theta_L^*, \hat\theta_U^*]$ ，其中 $\hat\theta_L^*$ 是顺序（递增）统计量 $\{\hat\theta^*(g)\}_{g=1}^G$ 的第 $(\alpha/2)G$ 个估计值，而 $\hat\theta_U^*$ 是第 $(1-\alpha/2)G$ 个估计值。

For example, when $\alpha=0.05$ and G=1000, $\hat\theta_L^*$ is the 25-th order statistic and $\hat\theta_U^*$ is the 975-th order statistic of $\hat\theta^*(1),\cdots,\hat\theta^*(1000)$ , respectively.

Non-Parametric Bootstrap

**Why need the non-parametric bootstrap?**

In many real applications, the form of density function is unknown.
It is desirable to use a non-parametric bootstrap method to obtain the estimated standard error of an estimator (e.g., the least square estimator (LSE)) or the BCI for a population parameter (e.g., the mean of population distribution).

The key is how to generate a bootstrap sample from the empirical cdf.

Non-Parametric Bootstrap World

假设分布(the empirical cdf)形式为 $x \sim \hat F_n(x;\theta)=\hat F_n(x)=\frac{1}{n} \sum_{i=1}^n I_{(x_i \leq x)}$ where we assume $x_1\leq x_2\leq \cdots \leq x_n$ . 基于function $\hat F_n$ , $\theta$ 估计量可由 $\hat\theta=T(\hat F_n)=s(\vec x)$ 计算得到。

**算法过程**

Step1. 计算参数的点估计(point estimator) $\hat\theta=s(\vec x)$ 。

Step2. 生成独立同分布bootstrap sample $\vec X^*=(x_1^*,\cdots,x_n^*)^T$ with $\{x_i^*\}_{i=1}^n\sim \hat F_n(x)$ 并且计算对应的bootstrap replication $\hat\theta^*=s(\vec x^*)$ .

Step3. 独立地重复G次 Step2 ，获得G bootstrap replication $\{\hat\theta^*(g)\}_{g=1}^G$ .

Step4. 与此同时，标准误差 ${Se}(\hat \theta)$ 可以通过the G replication来估计，如

$\widehat {Se}^*(\hat \theta)=\sqrt{\frac{1}{G-1}\sum_{g=1}^G[\hat\theta^*(g)-\overline \theta^*]^2} \\ where \qquad \overline\theta^*=[\hat\theta^*(1)+\cdots+\hat\theta^*(G)]/G$

Note: in standard normal distribution, $z_{\alpha/2} \approx 1.96$ .

**注:**

从 $\hat F_n(x)$ 中抽取独立同分布的bootstrap samples $\vec x^*=(x_1^*,\cdots,x_n^*)^T$ 。事实上，由于抽取 $x_i^*$ 的过程相当于有放回抽样，必有其抽样样本等于原样本中某一样本，比如 $x_1^*=x_7, x_2^*=x_3, \cdots,x_n^*=x_7$ 。这说明bootstrap sample是由原样本(the original observations)组成的。某些样本可能出现0次、1次、2次甚至更多。

Hypothesis Testing with the Bootstrap

待整理...

目录

相关先修内容

概率

大数定律 Law of Large Numbers

弱大数定律

强大数定律

关于converges almost surely 和 converges in probability

中心极限定律 Central Limit Theorem

其他

补充

各种概率分布

离散型分布

连续型分布

抽样方法

The Inversion Method

离散型

连续型

The Grid Method

连续型 Only

The Rejection Method (The AR Method(Acceptance-Rejection))

连续型/离散型

The Sampling/Importance Resampling Method (The SIR Method)

连续型 only

The Stochastic Representation Method (The SR Method)

连续型/离散型

The Conditional Sampling Method (The CS Method)

连续型/离散型

统计中的优化算法

基础

The Newton-Raphson Algorithm

The Fisher Scoring Algorithm

The EM Algorithm

The ECM Algorithm

The MM Algorithms

经典蒙特卡洛积分与黎曼和估计

Classical Monte Carlo Integration

The Riemannian Sum Estimator

贝叶斯统计 Bayesian Statistics

随机过程

马尔可夫链

平稳过程

蒙特卡洛方法 MCMC Methods

Bootstrap Methods

Parametric Bootstrap

An example: Large-sample CIs for one-sample problem

Parametric Bootstrap World

Non-Parametric Bootstrap

Non-Parametric Bootstrap World

Hypothesis Testing with the Bootstrap