需要安装的包:
install.packages(c("mvtnorm","ICSNP","biotools"))
设:
考虑一个n个观测p个变量的数据集:
\[X=[X_{ij}]=\begin{bmatrix} X_{1,1} & X_{1,2}& \cdots & X_{1,p} \\ X_{2,1} & X_{2,2}& \cdots & X_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ X_{n,1} & X_{n,2}& \cdots & X_{n,p} \end{bmatrix}= \begin{bmatrix} X_{1}^{\prime} \\ X_2^{\prime} \\ \vdots \\ X_{n}^{\prime} \end{bmatrix}\]
描述性统计量:
母体: \[ Cov(X_i,X_j)=E(X_i-\mu_i)(X_j-\mu_j)\]
\[\Sigma= \begin{bmatrix} \sigma^2_1 & \sigma_{12}& \cdots & \sigma_{1q} \\ \sigma_{21} & \sigma^2_{2}& \cdots & \sigma_{2q} \\ \sigma^2_1 & \sigma_{12}& \cdots & \sigma_{3q} \end{bmatrix}\] 样本:
\[S=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})(X_i-\bar{X})^\prime\]
其中: \[X^\prime_i=(x_{i1},x_{i2},\cdots,x_{iq})\] \[\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i\]
加载数据:
measure <-structure(list(V1 = 1:20, V2 = c(34L, 37L, 38L, 36L, 38L, 43L, 40L, 38L, 40L, 41L, 36L, 36L, 34L, 33L, 36L, 37L, 34L, 36L, 38L, 35L), V3 = c(30L, 32L, 30L, 33L, 29L, 32L, 33L, 30L, 30L, 32L, 24L, 25L, 24L, 22L, 26L, 26L, 25L, 26L, 28L, 23L), V4 = c(32L, 37L, 36L, 39L, 33L, 38L, 42L, 40L, 37L, 39L, 35L, 37L, 37L, 34L, 38L, 37L, 38L, 37L, 40L, 35L)), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA, -20L))
measure <- measure[,-1]
names(measure) <- c("chest", "waist", "hips")
measure$gender <- gl(2, 10)
levels(measure$gender) <- c("male", "female")
数据描述: measure: 20位个体的三围数据(chest,waist.hips),gender:性别
cov(measure[, c("chest","waist","hips")])
## chest waist hips
## chest 6.631579 6.368421 3.000000
## waist 6.368421 12.526316 3.578947
## hips 3.000000 3.578947 5.944737
male:
## chest waist hips
## chest 6.7222222 0.9444444 3.944444
## waist 0.9444444 2.1000000 3.077778
## hips 3.9444444 3.0777778 9.344444
female:
## chest waist hips
## chest 2.277778 2.166667 1.555556
## waist 2.166667 2.988889 2.755556
## hips 1.555556 2.755556 3.066667
母体:
\[\mathrm{\rho}=\begin{bmatrix} 1 & \rho_{12}& \cdots & \rho_{1q} \\ \rho_{21} & 1& \cdots & \rho_{2q} \\ \rho^2_1 & \rho_{12}& \cdots & 1 \end{bmatrix}\] 其中: \[\rho_{ij}=\frac{\sigma_{ij}}{\sigma_1 \sigma_2}\] 样本:
\[R=D^{- {1\over 2}}SD^{1\over2}\] 其中: \[D^{1\over2}=diag\{1/s_1,1/s_2,\cdots,1/s_q\}\] \(s_i=\sqrt{s^2_i}\):第i个变量的样本标准差
cor(measure[, c("chest", "waist", "hips")])
## chest waist hips
## chest 1.0000000 0.6987336 0.4778004
## waist 0.6987336 1.0000000 0.4147413
## hips 0.4778004 0.4147413 1.0000000
\[d_{ij} =\sqrt{\sum_{k=1}^{q}(x_{ik}-x_{jk})^2}\]
dist(scale(measure[, c("chest", "waist", "hips")],center = FALSE))
## 1 2 3 4 5 6
## 2 0.16799417
## 3 0.14859277 0.07845035
## 4 0.21743471 0.06812705 0.14038753
## 5 0.11373206 0.14986992 0.08600399 0.21603978
## 6 0.29246910 0.15986410 0.15748126 0.18902383 0.21270228
## 7 0.32331540 0.15697512 0.19574949 0.13135906 0.27872662 0.13578704
## 8 0.23486772 0.10801989 0.10501275 0.11914201 0.18699166 0.15748126
## 9 0.20517809 0.10483938 0.05875596 0.15668953 0.12241049 0.10807648
## 10 0.26906113 0.11751193 0.13112728 0.13587706 0.20438116 0.05875596
## 11 0.22789543 0.28255828 0.21545878 0.32819305 0.18804357 0.34123560
## 12 0.22323578 0.24326530 0.18246286 0.28133330 0.18135211 0.30499777
## 13 0.24535886 0.28741665 0.23390508 0.31969197 0.22786014 0.36473575
## 14 0.28255828 0.36961728 0.31051149 0.40972584 0.27648768 0.44661532
## 15 0.21603978 0.21059490 0.15690070 0.24326216 0.17531781 0.27715949
## 16 0.20626548 0.20729260 0.14310089 0.24886728 0.14986992 0.26177615
## 17 0.23377919 0.25572114 0.20892459 0.28256640 0.21767121 0.33828786
## 18 0.19771610 0.20895210 0.15016704 0.24747559 0.15663099 0.27840009
## 19 0.24482099 0.16121943 0.12570659 0.18246286 0.18699166 0.19779726
## 20 0.25569723 0.31969197 0.25572114 0.36204995 0.22791219 0.38352915
## 7 8 9 10 11 12
## 2
## 3
## 4
## 5
## 6
## 7
## 8 0.12752454
## 9 0.16725221 0.09468951
## 10 0.08993022 0.10807648 0.09067606
## 11 0.37617469 0.25092629 0.23828402 0.32355547
## 12 0.32353421 0.19699374 0.20221910 0.28020194 0.06285330
## 13 0.37253329 0.24540869 0.26045638 0.33614816 0.07429639 0.06290191
## 14 0.47157774 0.34419869 0.34123560 0.42520702 0.10807648 0.15219215
## 15 0.28384337 0.15690070 0.17561105 0.24683681 0.10477375 0.04339179
## 16 0.28624259 0.16121943 0.15910595 0.23828402 0.09067606 0.04340940
## 17 0.33509176 0.20892459 0.23536560 0.30499777 0.10079543 0.05875596
## 18 0.29456770 0.16752307 0.17363758 0.25099024 0.08678357 0.03454877
## 19 0.18804357 0.06909753 0.11722018 0.16125735 0.19771610 0.14038753
## 20 0.41279892 0.28624259 0.28020194 0.36411259 0.04340940 0.09067606
## 13 14 15 16 17 18
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14 0.10801989
## 15 0.09070134 0.19063678
## 16 0.10483938 0.19066484 0.03714819
## 17 0.04339179 0.14986992 0.06290191 0.08999818
## 18 0.08681879 0.17753245 0.02625319 0.02628228 0.06816070
## 19 0.19066484 0.29163583 0.10146143 0.10801989 0.15668953 0.11722018
## 20 0.06812705 0.06816070 0.13280204 0.12752454 0.10801989 0.11912277
## 19
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20 0.23084195
\[d_i^2=(x_i-\bar{x})^\prime S^{-1}(x_i-\bar{x})\]
x <- measure[, c("chest", "waist", "hips")]
cm <- colMeans(x)
S <- cov(x)
d <- apply(x, MARGIN = 1, function(x) t(x - cm) %*% solve(S) %*% (x - cm))
d
## [1] 9.0287361 2.5481132 0.8979442 6.3739058 4.6528524 6.4608693 4.5631305
## [8] 1.5795960 1.8592906 2.4199926 1.9477228 0.8861995 2.0050976 3.3637016
## [15] 0.8285416 0.6302038 2.5524763 0.3783315 1.7955249 2.2277698
例:要从总体均值为(2,3),协方差阵为: \[ \begin{pmatrix} 2&1\\ 1&2 \end{pmatrix} \] 抽取1000个随机向量。
A=matrix(c(2,1,1,2),nrow=2,ncol=2);A
## [,1] [,2]
## [1,] 2 1
## [2,] 1 2
library(mvtnorm)
set.seed(123)
X<-rmvnorm(1000, c(2, 3),A)
head(X)
## [,1] [,2]
## [1,] 1.150125 2.480423
## [2,] 4.155043 3.666843
## [3,] 2.804368 5.390145
## [4,] 2.166579 1.440601
## [5,] 0.898618 2.139809
## [6,] 3.803828 3.939560
cov(X)
## [,1] [,2]
## [1,] 1.8490873 0.8629169
## [2,] 0.8629169 2.0184171
cor(X)
## [,1] [,2]
## [1,] 1.000000 0.446668
## [2,] 0.446668 1.000000
summary(X)
## V1 V2
## Min. :-1.999 Min. :-0.762
## 1st Qu.: 1.124 1st Qu.: 2.083
## Median : 2.144 Median : 3.061
## Mean : 2.060 Mean : 3.041
## 3rd Qu.: 2.909 3rd Qu.: 4.042
## Max. : 6.858 Max. : 8.328
例:检验上述例子是否来自正态总体。
library(ICS)
mvnorm.kur.test(X, method = "satt")
##
## Multivariate Normality Test Based on Kurtosis
##
## data: X
## W = 1.094, w1 = 1.5, df1 = 2.0, w2 = 2.0, df2 = 1.0, p-value =
## 0.8799
返回课程主页。