多元正态分布的参数估计1
内容来源
应用多元统计分析 北京大学出版社 高惠璇编著
p p p 元正态总体 X ∼ N p ( μ , Σ ) X\sim N_p(\mu,\Sigma) X∼Np(μ,Σ)
X i = ( x i 1 , ⋯ , x i p ) ′ X_i=(x_{i1},\cdots,x_{ip})' Xi=(xi1,⋯,xip)′
是总体的简单随机样本,此时观测数据阵
X = [ x 11 ⋯ x 1 p ⋮ ⋮ x n 1 ⋯ x n p ] X=\left[\begin{matrix} x_{11}&\cdots&x_{1p}\\ \vdots&&\vdots\\ x_{n1}&\cdots&x_{np} \end{matrix}\right] X= x11⋮xn1⋯⋯x1p⋮xnp
是一个随机阵
数字特征
样本均值向量
X ‾ = 1 n ∑ i = 1 n X i = ( x ‾ 1 , ⋯ , x ‾ p ) ′ = 1 n X ′ 1 n \overline{X}=\frac{1}{n}\sum^n_{i=1}X_i=(\overline{x}_1,\cdots, \overline{x}_p)'=\frac{1}{n}X'1_n X=n1i=1∑nXi=(x1,⋯,xp)′=n1X′1n
其中
x ‾ i = 1 n ∑ α = 1 n x α i \overline{x}_i=\frac{1}{n}\sum^n_{\alpha=1}x_{\alpha i} xi=n1α=1∑nxαi
样本离差阵(又称交叉乘积阵)
A = ∑ α = 1 n ( X α − X ‾ ) ( X α − X ‾ ) ′ = X ′ X − n X ‾ X ‾ ′ = ( a i j ) p × p A=\sum^n_{\alpha=1}(X_\alpha-\overline{X})(X_\alpha-\overline{X})'=X'X- n\overline{X}\overline{X}'=(a_{ij})_{p\times p} A=α=1∑n(Xα−X)(Xα−X)′=X′X−nXX′=(aij)p×p
其中
a i j = ∑ α = 1 n ( x α i − x ‾ i ) ( x α j − x ‾ j ) a_{ij}=\sum^n_{\alpha=1}(x_{\alpha i}-\overline{x}_i) (x_{\alpha j}-\overline{x}_j) aij=α=1∑n(xαi−xi)(xαj−xj)
样本协方差阵
S = 1 n − 1 A S=\frac{1}{n-1}A S=n−11A
样本相关阵
R = ( r i j ) p × p R=(r_{ij})_{p\times p} R=(rij)p×p
其中
r i j = a i j a i i a j j r_{ij}=\frac{a_{ij}}{\sqrt{a_{ii}}\sqrt{a_{jj}}} rij=aiiajjaij
μ , Σ \mu,\Sigma μ,Σ 的最大似然估计
把随机数据阵拉直后形成的 n p np np 维长向量 V e c ( X ′ ) Vec(X') Vec(X′) 的联合密度函数看成 μ , Σ \mu,\Sigma μ,Σ 的函数
称为样本的似然函数
L ( μ , Σ ) = ∏ α = 1 n 1 ( 2 π ) p / 2 ∣ Σ ∣ 1 / 2 exp [ − 1 2 ( x α − μ ) ′ Σ − 1 ( x α − μ ) ] = 1 ( 2 π ) n p / 2 ∣ Σ ∣ n / 2 exp [ − 1 2 ∑ α = 1 n ( x α − μ ) ′ Σ − 1 ( x α − μ ) ] = 1 ( 2 π ) n p / 2 ∣ Σ ∣ n / 2 exp [ − 1 2 ∑ α = 1 n tr ( ( x α − μ ) ′ Σ − 1 ( x α − μ ) ) ] = 1 ( 2 π ) n p / 2 ∣ Σ ∣ n / 2 exp [ − 1 2 ∑ α = 1 n tr ( Σ − 1 ( x α − μ ) ( x α − μ ) ′ ) ] = 1 ( 2 π ) n p / 2 ∣ Σ ∣ n / 2 exp [ tr ( − 1 2 Σ − 1 ∑ α = 1 n ( x α − μ ) ( x α − μ ) ′ ) ] \begin{align*} L(\mu,\Sigma)&=\prod^n_{\alpha=1}\frac{1}{(2\pi)^{p/2}|\Sigma|^{1/2}} \exp\left[-\frac{1}{2}(x_\alpha-\mu)'\Sigma^{-1}(x_\alpha-\mu)\right]\\ &=\frac{1}{(2\pi)^{np/2}|\Sigma|^{n/2}}\exp\left[ -\frac{1}{2}\sum^n_{\alpha=1}(x_\alpha-\mu)'\Sigma^{-1}(x_\alpha-\mu) \right]\\ &=\frac{1}{(2\pi)^{np/2}|\Sigma|^{n/2}}\exp\left[ -\frac{1}{2}\sum^n_{\alpha=1} \text{tr}((x_\alpha-\mu)'\Sigma^{-1}(x_\alpha-\mu)) \right]\\ &=\frac{1}{(2\pi)^{np/2}|\Sigma|^{n/2}}\exp\left[ -\frac{1}{2}\sum^n_{\alpha=1} \text{tr}(\Sigma^{-1}(x_\alpha-\mu)(x_\alpha-\mu)') \right]\\ &=\frac{1}{(2\pi)^{np/2}|\Sigma|^{n/2}}\exp\left[ \text{tr}\left(-\frac{1}{2}\Sigma^{-1}\sum^n_{\alpha=1} (x_\alpha-\mu)(x_\alpha-\mu)'\right) \right]\\ \end{align*} L(μ,Σ)=α=1∏n(2π)p/2∣Σ∣1/21exp[−21(xα−μ)′Σ−1(xα−μ)]=(2π)np/2∣Σ∣n/21exp[−21α=1∑n(xα−μ)′Σ−1(xα−μ)]=(2π)np/2∣Σ∣n/21exp[−21α=1∑ntr((xα−μ)′Σ−1(xα−μ))]=(2π)np/2∣Σ∣n/21exp[−21α=1∑ntr(Σ−1(xα−μ)(xα−μ)′)]=(2π)np/2∣Σ∣n/21exp[tr(−21Σ−1α=1∑n(xα−μ)(xα−μ)′)]
正定矩阵迹的有关性质
设 B B B 为 p p p 阶正定矩阵,则
tr B − ln ∣ B ∣ ⩾ p \text{tr}B-\ln|B|\geqslant p trB−ln∣B∣⩾p
且等号成立的充要条件是 B = I p B=I_p B=Ip
ln L ( μ , Σ ) \ln L(\mu,\Sigma) lnL(μ,Σ)
ln L ( μ , Σ ) = − n p 2 ln 2 π − n 2 ln ∣ Σ ∣ − 1 2 tr [ Σ − 1 ∑ α = 1 n ( x α − μ ) ( x α − μ ) ′ ] \begin{align*} \ln L(\mu,\Sigma)=&-\frac{np}{2}\ln2\pi-\frac{n}{2}\ln|\Sigma|\\ &-\frac{1}{2}\text{tr}\left[\Sigma^{-1}\sum^n_{\alpha=1} (x_\alpha-\mu)(x_\alpha-\mu)'\right] \end{align*} lnL(μ,Σ)=−2npln2π−2nln∣Σ∣−21tr[Σ−1α=1∑n(xα−μ)(xα−μ)′]
其中
∑ α = 1 n ( x α − μ ) ( x α − μ ) ′ = ∑ α = 1 n ( x α − X ‾ + X ‾ − μ ) ( x α − X ‾ + X ‾ − μ ) ′ = ∑ α = 1 n ( x α − X ‾ ) ( x α − X ‾ ) ′ + n ( X ‾ − μ ) ( X ‾ − μ ) ′ = A + n ( X ‾ − μ ) ( X ‾ − μ ) ′ \begin{align*} &\sum^n_{\alpha=1}(x_\alpha-\mu)(x_\alpha-\mu)'\\ &=\sum^n_{\alpha=1}(x_\alpha-\overline{X}+\overline{X}-\mu) (x_\alpha-\overline{X}+\overline{X}-\mu)'\\ &=\sum^n_{\alpha=1}(x_\alpha-\overline{X})(x_\alpha-\overline{X})'+ n(\overline{X}-\mu)(\overline{X}-\mu)'\\ &=A+n(\overline{X}-\mu)(\overline{X}-\mu)' \end{align*} α=1∑n(xα−μ)(xα−μ)′=α=1∑n(xα−X+X−μ)(xα−X+X−μ)′=α=1∑n(xα−X)(xα−X)′+n(X−μ)(X−μ)′=A+n(X−μ)(X−μ)′
ln L ( μ , Σ ) = C − 1 2 tr [ Σ − 1 A + n Σ − 1 ( X ‾ − μ ) ( X ‾ − μ ) ′ ] = C − 1 2 tr ( Σ − 1 A ) − n 2 ( X ‾ − μ ) ′ Σ − 1 ( X ‾ − μ ) ⩽ C − 1 2 tr ( Σ − 1 A ) \begin{align*} \ln L(\mu,\Sigma)&=C-\frac{1}{2}\text{tr}\left[ \Sigma^{-1}A+n\Sigma^{-1}(\overline{X}-\mu)(\overline{X}-\mu)' \right]\\ &=C-\frac{1}{2}\text{tr}(\Sigma^{-1}A)-\frac{n}{2} (\overline{X}-\mu)'\Sigma^{-1}(\overline{X}-\mu)\\ &\leqslant C-\frac{1}{2}\text{tr}(\Sigma^{-1}A) \end{align*} lnL(μ,Σ)=C−21tr[Σ−1A+nΣ−1(X−μ)(X−μ)′]=C−21tr(Σ−1A)−2n(X−μ)′Σ−1(X−μ)⩽C−21tr(Σ−1A)
仅当 μ = X ‾ \mu=\overline{X} μ=X 时等号成立,即对于固定的 Σ \Sigma Σ,有
ln L ( X ‾ , Σ ) = max μ ln L ( μ , Σ ) \ln L(\overline{X},\Sigma)=\max_\mu\ln L(\mu,\Sigma) lnL(X,Σ)=μmaxlnL(μ,Σ)
ln L ( X ‾ , Σ ) \ln L(\overline{X},\Sigma) lnL(X,Σ)
ln L ( X ‾ , Σ ) = − n p 2 ln 2 π − n 2 ln ∣ Σ ∣ − 1 2 tr ( Σ − 1 A ) = C − n 2 [ ln ∣ Σ ∣ + tr ( Σ − 1 A n ) ] = C − n 2 [ tr ( Σ − 1 A n ) − ln ∣ Σ − 1 A n ∣ + ln ∣ A n ∣ ] ⩽ C − n p 2 − n 2 ln ∣ A n ∣ \begin{align*} \ln L(\overline{X},\Sigma)&= -\frac{np}{2}\ln2\pi-\frac{n}{2}\ln|\Sigma| -\frac{1}{2}\text{tr}(\Sigma^{-1}A)\\ &=C-\frac{n}{2}\left[\ln|\Sigma|+ \text{tr}\left(\Sigma^{-1}\frac{A}{n}\right)\right]\\ &=C-\frac{n}{2}\left[\text{tr}\left(\Sigma^{-1}\frac{A}{n}\right) -\ln\left|\Sigma^{-1}\frac{A}{n}\right|+\ln\left|\frac{A}{n}\right|\right]\\ &\leqslant C-\frac{np}{2}-\frac{n}{2}\ln\left|\frac{A}{n}\right| \end{align*} lnL(X,Σ)=−2npln2π−2nln∣Σ∣−21tr(Σ−1A)=C−2n[ln∣Σ∣+tr(Σ−1nA)]=C−2n[tr(Σ−1nA)−ln Σ−1nA +ln nA ]⩽C−2np−2nln nA
根据上面迹的有关性质,仅当 Σ = A n \Sigma=\frac{A}{n} Σ=nA 等号成立
故最大似然估计为
μ ^ = X ‾ , Σ ^ = 1 n A \hat{\mu}=\overline{X},\hat{\Sigma}=\frac{1}{n}A μ^=X,Σ^=n1A