例如为了了解家庭的特征与其消费模式之间的关系。调查了若干个家庭的下面两组变量:
现在要研究两组变量的关系,如果是两个变量可以计算相关系数,但是两组变量如何研究呢?
如果我们记两组变量的第一对线性组合为: \[\begin{align*} X=&(X_1,\cdots,X_p)\\ Y=&(Y_1,\cdots,Y_q)\\ u_1=&\alpha_1^{'}X\\ v_1=&\beta_1^{'}Y \end{align*}\] 其中: \(\alpha_1=(\alpha_{11},\alpha_{21},\cdots,\alpha_{p1})^{'}\) \(\beta_1=(\beta_{11},\beta_{21},\cdots,\beta_{q1})^{'}\) \(\Sigma\)是协方差矩阵 考虑: \[\begin{align*} Var(u_1)=& \alpha_1^{'}Var(X)\alpha_1=\alpha_1^{'}\Sigma_{11}\alpha=1\\ Var(v_1)=& \beta_1^{'}Var(Y)\beta_1=\beta_1^{'}\Sigma_{22}\beta=1\\ \rho_{u_1,v_1}=&Cov(u_1,v_1)=\alpha_1^{'}Cov(X,Y)\beta_1=\alpha_1^{'}\Sigma_{12}\beta_1 \end{align*}\] 典型相关分析就是求\(\alpha_1,\beta_1\),使两者的相关系数\(\rho\) 达到最大。
典型相关分析是否恰当,应该取决于两组原变量之间是否相关,如果两组变量之间毫无相关性而言,则不应该作典型相关分析。用样本来估计总体的典型相关系数是否有误,需要进行检验。
\(H_0\): \(\rho_1=\cdots=\rho_r=0\) \(H_1\): \(\rho_i\) 中,至少\(\rho_1\) 不为 \(0\)
检验统计量: 不放设:\(r=min(p,q)=p\) \[\Lambda_0=\frac{|S|}{|S_{XX}||S_{YY}|}=\prod_{i=1}^p(1-\rho_i^2)\] \(\Lambda_0\) 越小,越支持备则假设。 \[Q_0=-[(n-1)-\frac{1}{2}(p+q+1)]ln \Lambda_0\] 在原假设成立的条件下,\(Q_0\) 服从自由度为\(p\times q\) 的卡方分布。
康复俱乐部对20名中年人测量了三个生理指标:体重(x1),腰围(x2),脉搏(x3);三个训练指标:引体向上次数(y1),起坐次数(y2),跳跃次数(y3)。分析生理指标与训练指标的相关性。
ex1=read.table("http://statstudy.github.io/data/9-1.txt",head=T)
ex1
## x1 x2 x3 y1 y2 y3
## 1 191 36 50 5 162 60
## 2 189 37 52 2 110 60
## 3 193 38 58 12 101 101
## 4 162 35 62 12 105 37
## 5 189 35 46 13 155 58
## 6 182 36 56 4 101 42
## 7 211 38 56 8 101 38
## 8 167 34 60 6 125 40
## 9 176 31 74 15 200 40
## 10 154 33 56 17 251 250
## 11 169 34 50 17 120 38
## 12 166 33 52 13 210 115
## 13 154 34 64 14 215 105
## 14 247 46 50 1 50 50
## 15 193 36 46 6 70 31
## 16 202 37 62 12 210 120
## 17 176 37 54 4 60 25
## 18 157 32 52 11 230 80
## 19 156 33 54 15 225 73
## 20 138 33 68 2 110 43
scale(ex1)
## x1 x2 x3 y1 y2 y3
## [1,] 0.5022173 0.1873845 -0.84600343 -0.8418021 0.2629199 -0.2008679
## [2,] 0.4212145 0.4996919 -0.56862526 -1.4093091 -0.5681948 -0.2008679
## [3,] 0.5832201 0.8119993 0.26350927 0.4823810 -0.7120415 0.5987035
## [4,] -0.6723232 -0.1249230 0.81826561 0.4823810 -0.6481096 -0.6494080
## [5,] 0.4212145 -0.1249230 -1.40075978 0.6715500 0.1510391 -0.2398714
## [6,] 0.1377048 0.1873845 -0.01386891 -1.0309711 -0.7120415 -0.5518993
## [7,] 1.3122453 0.8119993 -0.01386891 -0.2742951 -0.7120415 -0.6299063
## [8,] -0.4698162 -0.4372304 0.54088744 -0.6526331 -0.3284501 -0.5909028
## [9,] -0.1053036 -1.3741527 2.48253466 1.0498880 0.8702730 -0.5909028
## [10,] -0.9963344 -0.7495378 -0.01386891 1.4282260 1.6854047 3.5044631
## [11,] -0.3888134 -0.4372304 -0.84600343 1.4282260 -0.4083650 -0.6299063
## [12,] -0.5103176 -0.7495378 -0.56862526 0.6715500 1.0301027 0.8717279
## [13,] -0.9963344 -0.4372304 1.09564379 0.8607190 1.1100176 0.6767105
## [14,] 2.7702957 3.3104588 -0.84600343 -1.5984781 -1.5271733 -0.3958854
## [15,] 0.5832201 0.1873845 -1.40075978 -0.6526331 -1.2075138 -0.7664185
## [16,] 0.9477327 0.4996919 0.81826561 0.4823810 1.0301027 0.9692366
## [17,] -0.1053036 0.4996919 -0.29124708 -1.0309711 -1.3673435 -0.8834289
## [18,] -0.8748302 -1.0618453 -0.56862526 0.2932120 1.3497622 0.1891669
## [19,] -0.9153316 -0.7495378 -0.29124708 1.0498880 1.2698474 0.0526547
## [20,] -1.6443568 -0.7495378 1.65040014 -1.4093091 -0.5681948 -0.5323976
## attr(,"scaled:center")
## x1 x2 x3 y1 y2 y3
## 178.60 35.40 56.10 9.45 145.55 70.30
## attr(,"scaled:scale")
## x1 x2 x3 y1 y2 y3
## 24.690505 3.201973 7.210373 5.286278 62.566575 51.277470
x=ex1[,1:3]
y=ex1[,4:6]
s11=cor(x);s11
## x1 x2 x3
## x1 1.0000000 0.8702435 -0.3657620
## x2 0.8702435 1.0000000 -0.3528921
## x3 -0.3657620 -0.3528921 1.0000000
s22=cor(y);s22
## y1 y2 y3
## y1 1.0000000 0.6957274 0.4957602
## y2 0.6957274 1.0000000 0.6692061
## y3 0.4957602 0.6692061 1.0000000
cor(ex1)
## x1 x2 x3 y1 y2 y3
## x1 1.0000000 0.8702435 -0.36576203 -0.3896937 -0.4930836 -0.22629556
## x2 0.8702435 1.0000000 -0.35289213 -0.5522321 -0.6455980 -0.19149937
## x3 -0.3657620 -0.3528921 1.00000000 0.1506480 0.2250381 0.03493306
## y1 -0.3896937 -0.5522321 0.15064802 1.0000000 0.6957274 0.49576018
## y2 -0.4930836 -0.6455980 0.22503808 0.6957274 1.0000000 0.66920608
## y3 -0.2262956 -0.1914994 0.03493306 0.4957602 0.6692061 1.00000000
s12=cor(ex1)[1:3,4:6];s12
## y1 y2 y3
## x1 -0.3896937 -0.4930836 -0.22629556
## x2 -0.5522321 -0.6455980 -0.19149937
## x3 0.1506480 0.2250381 0.03493306
s21=cor(ex1)[4:6,1:3];s21
## x1 x2 x3
## y1 -0.3896937 -0.5522321 0.15064802
## y2 -0.4930836 -0.6455980 0.22503808
## y3 -0.2262956 -0.1914994 0.03493306
A=solve(s11)%*%s12%*%solve(s22)%*%s21
A
## x1 x2 x3
## x1 -0.24594544 -0.42556193 0.15927694
## x2 0.58342555 0.90714323 -0.32827241
## x3 -0.01679293 -0.03129267 0.01728371
eigen(A)$vectors
## [,1] [,2] [,3]
## [1,] 0.4404622 -0.8428694 -0.1616014
## [2,] -0.8971428 0.5280957 0.4281731
## [3,] 0.0335830 -0.1033731 0.8891304
sqrt(eigen(A)$values)
## [1] 0.79560815 0.20055604 0.07257029
u1=scale(ex1)[,1:3]%*%eigen(A)$vectors[,1];u1
## [,1]
## [1,] 0.024685782
## [2,] -0.281862072
## [3,] -0.462743543
## [4,] -0.156579379
## [5,] 0.250561111
## [6,] -0.107922651
## [7,] -0.150950724
## [8,] 0.203486477
## [9,] 1.269799947
## [10,] 0.233129134
## [11,] 0.192589192
## [12,] 0.428570759
## [13,] -0.009794482
## [14,] -1.778155296
## [15,] 0.041734069
## [16,] -0.003374780
## [17,] -0.504458222
## [18,] 0.548201132
## [19,] 0.259492611
## [20,] 0.003590935
k1=c(0.031,-0.493,0.008);k1
## [1] 0.031 -0.493 0.008
u11=scale(ex1)[,1:3]%*%t(t(k1))
u11
## [,1]
## [1,] -0.08357983
## [2,] -0.23783946
## [3,] -0.38012777
## [4,] 0.04729113
## [5,] 0.06343860
## [6,] -0.08822264
## [7,] -0.35974702
## [8,] 0.20531739
## [9,] 0.69405315
## [10,] 0.33852484
## [11,] 0.19673335
## [12,] 0.34915331
## [13,] 0.19343338
## [14,] -1.55294506
## [15,] -0.08550679
## [16,] -0.21042227
## [17,] -0.25194249
## [18,] 0.49182098
## [19,] 0.33881690
## [20,] 0.33175030
B=solve(s22)%*%s21%*%solve(s11)%*%s12
B
## y1 y2 y3
## y1 0.1617883 0.1718776 0.02299820
## y2 0.4824417 0.5487737 0.11144827
## y3 -0.3184295 -0.3464725 -0.03208051
eigen(B)
## $values
## [1] 0.632992335 0.040222726 0.005266446
##
## $vectors
## [,1] [,2] [,3]
## [1,] -0.2644705 -0.3313572 -0.7046067
## [2,] -0.7975886 0.1089606 0.6721095
## [3,] 0.5421327 0.9371926 -0.2275921
sqrt(eigen(B)$values)
## [1] 0.79560815 0.20055604 0.07257029
v1=scale(ex1)[,4:6]%*%eigen(B)$vectors[,1];v1
## [,1]
## [1,] -0.09596723
## [2,] 0.71700920
## [3,] 0.76491740
## [4,] 0.03728404
## [5,] -0.42811434
## [6,] 0.54137496
## [7,] 0.29896638
## [8,] 0.11422255
## [9,] -1.29213186
## [10,] 0.17790073
## [11,] -0.39350907
## [12,] -0.52661117
## [13,] -0.74610528
## [14,] 1.42618381
## [15,] 0.72020088
## [16,] -0.42371890
## [17,] 0.88430330
## [18,] -1.05154730
## [19,] -1.26193428
## [20,] 0.53727617
k2=c(0.066,0.017,-0.014)
k2
## [1] 0.066 0.017 -0.014
t(t(k2))
## [,1]
## [1,] 0.066
## [2,] 0.017
## [3,] -0.014
v11=scale(ex1)[,4:6]%*%t(t(k2))
cor(u1,v1)
## [,1]
## [1,] -0.7956082
cor(u11,v11)
## [,1]
## [1,] 0.6287931
A01=prod(1-eigen(A)$values)
A01
## [1] 0.3503905
Q01=-(20-1-0.5*(3+3+1))*log(A01)
Q01
## [1] 16.25496
pr1=1-pchisq(Q01,9)
pr1
## [1] 0.06174456
##################
A02=(1-eigen(A)$values[2])*(1-eigen(A)$values[3])
A02
## [1] 0.9547227
Q02=-(20-1-0.5*(3+3+1))*log(A02)
Q02
## [1] 0.7181831
pr2=1-pchisq(Q02,4)
pr2
## [1] 0.9490678
corcoef.test<-function(r, n, p, q, alpha=0.1){
#r为相关系数 n为样本个数 且n>p+q
m<-length(r); Q<-rep(0, m); lambda <- 1
for (k in m:1){
lambda<-lambda*(1-r[k]^2); #检验统计量
Q[k]<- -log(lambda) #检验统计量取对数
}
s<-0; i<-m
for (k in 1:m){
Q[k]<- (n-k+1-1/2*(p+q+3)+s)*Q[k] #统计量
chi<-1-pchisq(Q[k], (p-k+1)*(q-k+1))
if (chi>alpha){
i<-k-1; break
}
s<-s+1/r[k]^2
}
i #显示输出结果 选用第几对典型变量
}
corcoef.test(cancor(x,y)$cor,n=20,p=3,q=3,alpha=0.1)
## [1] 1