需要安装的包:

install.packages(c("mvtnorm","ICSNP","biotools"))

一元正态分布

密度函数

图形

性质

多元正态分布

密度函数

二元正态的密度图形

性质

进一步的性质

设:

考虑一个n个观测p个变量的数据集:

\[X=[X_{ij}]=\begin{bmatrix} X_{1,1} & X_{1,2}& \cdots & X_{1,p} \\ X_{2,1} & X_{2,2}& \cdots & X_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ X_{n,1} & X_{n,2}& \cdots & X_{n,p} \end{bmatrix}= \begin{bmatrix} X_{1}^{\prime} \\ X_2^{\prime} \\ \vdots \\ X_{n}^{\prime} \end{bmatrix}\]

描述性统计量:

  • 样本均值:\(\overline {X}_k=\frac{1}{n}\sum_{j=1}^n X_{jk},\,\,k=1,\cdots,p\)
  • 矩阵表示: \(\overline{X}=(\bar{X}_1,\bar{X}_2,\cdots,\bar{X}_p)^{\prime}=\frac{1}{n}\sum_{j=1}^n X_j\)

协方差、相关系数

协方差

母体: \[ Cov(X_i,X_j)=E(X_i-\mu_i)(X_j-\mu_j)\]

\[\Sigma= \begin{bmatrix} \sigma^2_1 & \sigma_{12}& \cdots & \sigma_{1q} \\ \sigma_{21} & \sigma^2_{2}& \cdots & \sigma_{2q} \\ \sigma^2_1 & \sigma_{12}& \cdots & \sigma_{3q} \end{bmatrix}\] 样本:

\[S=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})(X_i-\bar{X})^\prime\]

其中: \[X^\prime_i=(x_{i1},x_{i2},\cdots,x_{iq})\] \[\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i\]

R求协方差实例

加载数据:

measure <-structure(list(V1 = 1:20, V2 = c(34L, 37L, 38L, 36L, 38L, 43L, 40L, 38L, 40L, 41L, 36L, 36L, 34L, 33L, 36L, 37L, 34L, 36L, 38L, 35L), V3 = c(30L, 32L, 30L, 33L, 29L, 32L, 33L, 30L, 30L, 32L, 24L, 25L, 24L, 22L, 26L, 26L, 25L, 26L, 28L, 23L), V4 = c(32L,  37L, 36L, 39L, 33L, 38L, 42L, 40L, 37L, 39L, 35L, 37L, 37L, 34L, 38L, 37L, 38L, 37L, 40L, 35L)), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -20L))
measure <- measure[,-1]
names(measure) <- c("chest", "waist", "hips")
measure$gender <- gl(2, 10)
levels(measure$gender) <- c("male", "female")

数据描述: measure: 20位个体的三围数据(chest,waist.hips),gender:性别

    cov(measure[, c("chest","waist","hips")])
##          chest     waist     hips
## chest 6.631579  6.368421 3.000000
## waist 6.368421 12.526316 3.578947
## hips  3.000000  3.578947 5.944737

练习:分别对上述数据集的不同性别求三围之间的协方差矩阵。

male:

##           chest     waist     hips
## chest 6.7222222 0.9444444 3.944444
## waist 0.9444444 2.1000000 3.077778
## hips  3.9444444 3.0777778 9.344444

female:

##          chest    waist     hips
## chest 2.277778 2.166667 1.555556
## waist 2.166667 2.988889 2.755556
## hips  1.555556 2.755556 3.066667

相关系数

母体:

\[\mathrm{\rho}=\begin{bmatrix} 1 & \rho_{12}& \cdots & \rho_{1q} \\ \rho_{21} & 1& \cdots & \rho_{2q} \\ \rho^2_1 & \rho_{12}& \cdots & 1 \end{bmatrix}\] 其中: \[\rho_{ij}=\frac{\sigma_{ij}}{\sigma_1 \sigma_2}\] 样本:

\[R=D^{- {1\over 2}}SD^{1\over2}\] 其中: \[D^{1\over2}=diag\{1/s_1,1/s_2,\cdots,1/s_q\}\] \(s_i=\sqrt{s^2_i}\):第i个变量的样本标准差

cor(measure[, c("chest", "waist", "hips")])
##           chest     waist      hips
## chest 1.0000000 0.6987336 0.4778004
## waist 0.6987336 1.0000000 0.4147413
## hips  0.4778004 0.4147413 1.0000000

多元正态总体参数的极大似然估计

似然函数

估计

距离

欧几里德距离

\[d_{ij} =\sqrt{\sum_{k=1}^{q}(x_{ik}-x_{jk})^2}\]

dist(scale(measure[, c("chest", "waist", "hips")],center = FALSE))
##             1          2          3          4          5          6
## 2  0.16799417                                                       
## 3  0.14859277 0.07845035                                            
## 4  0.21743471 0.06812705 0.14038753                                 
## 5  0.11373206 0.14986992 0.08600399 0.21603978                      
## 6  0.29246910 0.15986410 0.15748126 0.18902383 0.21270228           
## 7  0.32331540 0.15697512 0.19574949 0.13135906 0.27872662 0.13578704
## 8  0.23486772 0.10801989 0.10501275 0.11914201 0.18699166 0.15748126
## 9  0.20517809 0.10483938 0.05875596 0.15668953 0.12241049 0.10807648
## 10 0.26906113 0.11751193 0.13112728 0.13587706 0.20438116 0.05875596
## 11 0.22789543 0.28255828 0.21545878 0.32819305 0.18804357 0.34123560
## 12 0.22323578 0.24326530 0.18246286 0.28133330 0.18135211 0.30499777
## 13 0.24535886 0.28741665 0.23390508 0.31969197 0.22786014 0.36473575
## 14 0.28255828 0.36961728 0.31051149 0.40972584 0.27648768 0.44661532
## 15 0.21603978 0.21059490 0.15690070 0.24326216 0.17531781 0.27715949
## 16 0.20626548 0.20729260 0.14310089 0.24886728 0.14986992 0.26177615
## 17 0.23377919 0.25572114 0.20892459 0.28256640 0.21767121 0.33828786
## 18 0.19771610 0.20895210 0.15016704 0.24747559 0.15663099 0.27840009
## 19 0.24482099 0.16121943 0.12570659 0.18246286 0.18699166 0.19779726
## 20 0.25569723 0.31969197 0.25572114 0.36204995 0.22791219 0.38352915
##             7          8          9         10         11         12
## 2                                                                   
## 3                                                                   
## 4                                                                   
## 5                                                                   
## 6                                                                   
## 7                                                                   
## 8  0.12752454                                                       
## 9  0.16725221 0.09468951                                            
## 10 0.08993022 0.10807648 0.09067606                                 
## 11 0.37617469 0.25092629 0.23828402 0.32355547                      
## 12 0.32353421 0.19699374 0.20221910 0.28020194 0.06285330           
## 13 0.37253329 0.24540869 0.26045638 0.33614816 0.07429639 0.06290191
## 14 0.47157774 0.34419869 0.34123560 0.42520702 0.10807648 0.15219215
## 15 0.28384337 0.15690070 0.17561105 0.24683681 0.10477375 0.04339179
## 16 0.28624259 0.16121943 0.15910595 0.23828402 0.09067606 0.04340940
## 17 0.33509176 0.20892459 0.23536560 0.30499777 0.10079543 0.05875596
## 18 0.29456770 0.16752307 0.17363758 0.25099024 0.08678357 0.03454877
## 19 0.18804357 0.06909753 0.11722018 0.16125735 0.19771610 0.14038753
## 20 0.41279892 0.28624259 0.28020194 0.36411259 0.04340940 0.09067606
##            13         14         15         16         17         18
## 2                                                                   
## 3                                                                   
## 4                                                                   
## 5                                                                   
## 6                                                                   
## 7                                                                   
## 8                                                                   
## 9                                                                   
## 10                                                                  
## 11                                                                  
## 12                                                                  
## 13                                                                  
## 14 0.10801989                                                       
## 15 0.09070134 0.19063678                                            
## 16 0.10483938 0.19066484 0.03714819                                 
## 17 0.04339179 0.14986992 0.06290191 0.08999818                      
## 18 0.08681879 0.17753245 0.02625319 0.02628228 0.06816070           
## 19 0.19066484 0.29163583 0.10146143 0.10801989 0.15668953 0.11722018
## 20 0.06812705 0.06816070 0.13280204 0.12752454 0.10801989 0.11912277
##            19
## 2            
## 3            
## 4            
## 5            
## 6            
## 7            
## 8            
## 9            
## 10           
## 11           
## 12           
## 13           
## 14           
## 15           
## 16           
## 17           
## 18           
## 19           
## 20 0.23084195

广义距离

\[d_i^2=(x_i-\bar{x})^\prime S^{-1}(x_i-\bar{x})\]

x <- measure[, c("chest", "waist", "hips")]
cm <- colMeans(x)
S <- cov(x)
d <- apply(x, MARGIN = 1, function(x) t(x - cm) %*% solve(S) %*% (x - cm))
d
##  [1] 9.0287361 2.5481132 0.8979442 6.3739058 4.6528524 6.4608693 4.5631305
##  [8] 1.5795960 1.8592906 2.4199926 1.9477228 0.8861995 2.0050976 3.3637016
## [15] 0.8285416 0.6302038 2.5524763 0.3783315 1.7955249 2.2277698

多元随机向量随机数的抽取

例:要从总体均值为(2,3),协方差阵为: \[ \begin{pmatrix} 2&1\\ 1&2 \end{pmatrix} \] 抽取1000个随机向量。

A=matrix(c(2,1,1,2),nrow=2,ncol=2);A
##      [,1] [,2]
## [1,]    2    1
## [2,]    1    2
library(mvtnorm)
set.seed(123)
X<-rmvnorm(1000, c(2, 3),A)
head(X)
##          [,1]     [,2]
## [1,] 1.150125 2.480423
## [2,] 4.155043 3.666843
## [3,] 2.804368 5.390145
## [4,] 2.166579 1.440601
## [5,] 0.898618 2.139809
## [6,] 3.803828 3.939560
cov(X)
##           [,1]      [,2]
## [1,] 1.8490873 0.8629169
## [2,] 0.8629169 2.0184171
cor(X)
##          [,1]     [,2]
## [1,] 1.000000 0.446668
## [2,] 0.446668 1.000000
summary(X)
##        V1               V2        
##  Min.   :-1.999   Min.   :-0.762  
##  1st Qu.: 1.124   1st Qu.: 2.083  
##  Median : 2.144   Median : 3.061  
##  Mean   : 2.060   Mean   : 3.041  
##  3rd Qu.: 2.909   3rd Qu.: 4.042  
##  Max.   : 6.858   Max.   : 8.328

多元正态性检验

例:检验上述例子是否来自正态总体。

library(ICS)
mvnorm.kur.test(X, method = "satt")
## 
##  Multivariate Normality Test Based on Kurtosis
## 
## data:  X
## W = 1.094, w1 = 1.5, df1 = 2.0, w2 = 2.0, df2 = 1.0, p-value =
## 0.8799

返回课程主页