需要安装的包:

install.packages(c("psych"))
  • 主成分分析(PCA)是一种数据降维技巧,它能将大量相关变量转化为一组很少的不相关变 量,这些无关变量称为主成分。例如,使用PCA可将30个相关(很可能冗余)的环境变量转化为 5个无关的成分变量,并且尽可能地保留原始数据集的信息。

  • 因子分析(EFA)是一系列用来发现一组变量的潜在结构的方法。它通过 寻找一组更小的、潜在的或隐藏的结构来解释已观测到的、显式的变量间的关系。例如, Harman74.cor包含了24个心理测验间的相互关系,受试对象为145个七年级或八年级的学生。 假使应用EFA来探索该数据,结果表明276个测验间的相互关系可用四个学生能力的潜在因子(语 言能力、反应速度、推理能力和记忆能力)来进行解释。

psych包中有用的因子分析函数

主成分分析

简介

  • 主成分分析(Principal Component Analysis,PCA), 将多个变量通过线性变换以选出较少个数重要变量的一种多元统计分析方法。

  • 主成分分析首先是由K.皮尔森对非随机变量引入的,尔后H.霍特林将此方法推广到随机向量的情形。信息的大小通常用离差平方和或方差来衡量。

  • PCA技术的一大好处是对数据进行降维的处理。我们可以对新求出的“主元”向量的重要性进行排序,根据需要取前面最重要的部分,将后面的维数省去,可以达到降维从而简化模型或是对数据进行压缩的效果。同时最大程度的保持了原有数据的信息。

  • PCA的思想是将n维特征映射到k维上(k<n),这k维是全新的正交特征。这k维特征称为主元,是重新构造出来的k维特征,而不是简单地从n维特征中去除其余n-k维特征。

PCA的原理

考虑一个n个观测p个变量的数据集:

\[X=[X_{ij}]=\begin{bmatrix} X_{1,1} & X_{1,2}& \cdots & X_{1,p} \\ X_{2,1} & X_{2,2}& \cdots & X_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ X_{n,1} & X_{n,2}& \cdots & X_{n,p} \end{bmatrix}= \begin{bmatrix} X_{1}^{\prime} \\ X_2^{\prime} \\ \vdots \\ X_{n}^{\prime} \end{bmatrix}= \begin{bmatrix} X_{1} & X_2 & \cdots & X_{p} \end{bmatrix}\]

\(X\) 的协方差阵 \(\Sigma\), 其特征值为: \[\lambda_1\ge \lambda_2\ge \cdots \ge \lambda_p \ge 0 \]

\begin{align*}Y_i=\mathbf{a}_i^{\prime}\mathbf{X}&=a_{i1}X_1+a_{i2}X_2+\cdots+a_{ip}X_p\end{align*} \begin{align*}Var(Y_i)&=Var(\mathbf{a}_i^{\prime}\mathbf{X})=\mathbf{a}_i^{\prime}\Sigma \mathbf{a}_i\end{align*}
  • 第一主成份 \(Y_1\) 要满足: 线性组合 \(\mathbf{a}_1^{\prime}\mathbf{X}\) 要使得 \[Var(Y_1)=Var(\mathbf{a}_1^{\prime}\mathbf{X})=\mathbf{a}_1^{\prime}\Sigma \mathbf{a}_1\] 达到最大,且 \(\mathbf{a}_1^{\prime}\mathbf{a}_1=1\)

  • 第二主成份 \(Y_2\) 要满足:线性组合 \(\mathbf{a}_2^{\prime}\mathbf{X}\) 要使得 \[Var(Y_2)=Var(\mathbf{a}_2^{\prime}\mathbf{X})=\mathbf{a}_2^{\prime}\Sigma \mathbf{a}_2\] 达到次最大,且 \(\mathbf{a}_2^{\prime}\mathbf{a}_2=1\),且 \(Cov(\mathbf{a}_2^{\prime}\mathbf{X},\mathbf{a}_1^{\prime}\mathbf{X})=0\)

  • 第k主成份 \(Y_k\) 要满足: 线性组合 \(\mathbf{a}_k^{\prime}\mathbf{X}\) 要使得 \[Var(Y_k)=Var(\mathbf{a}_k^{\prime}\mathbf{X})=\mathbf{a}_k^{\prime}\Sigma \mathbf{a}_k\] 达到次最大,且 \(\mathbf{a}_k^{\prime}\mathbf{a}_k=1\),且 \(Cov(\mathbf{a}_k^{\prime}\mathbf{X},\mathbf{a}_j^{\prime}\mathbf{X})=0\quad \forall j<k\)

结论1:考虑 \(\Sigma\) 的特征值和特种向量组合 \((\lambda_i,\mathbf{e}_i)\)\(\lambda_1\ge \lambda_2\ge \cdots \ge \lambda_p\ge 0\) 按特征值从大到小排列, 则第i主成份是: \[Y_i=e_{i1}X_1+\cdots+e_{ip}X_p \] 满足: \begin{align*} Var(Y_i)&=\mathbf{e}_i^{\prime}\Sigma \mathbf{e}_i=\lambda_i\\ Cov(Y_i,Y_j)&= \mathbf{e}_i^{\prime}\Sigma \mathbf{e}_j=0 \quad \forall j\ne i \end{align*}

结论2:上述的主成份具有下列的性质: \[\sum_{i=1}^{p} Var(X_i)=\sum_{i=1}^{p}\sigma_{ii}=\sum_{i=1}^{p} Var(Y_i)=\sum_{i=1}^{p} \lambda_i\]

根据结论2: 第i个主成份的变异比例为: \[\frac{\lambda_i}{\sum_{j=1}^{p} \lambda_j}\]

分析实例

查看一下R自带的数据集:USJudgeRatings,美国律师对最高法院法官的评分。

变量描述

library(graphics)
pairs(USJudgeRatings, main = "USJudgeRatings data")

summary(USJudgeRatings)
##       CONT             INTG            DMNR            DILG      
##  Min.   : 5.700   Min.   :5.900   Min.   :4.300   Min.   :5.100  
##  1st Qu.: 6.850   1st Qu.:7.550   1st Qu.:6.900   1st Qu.:7.150  
##  Median : 7.300   Median :8.100   Median :7.700   Median :7.800  
##  Mean   : 7.437   Mean   :8.021   Mean   :7.516   Mean   :7.693  
##  3rd Qu.: 7.900   3rd Qu.:8.550   3rd Qu.:8.350   3rd Qu.:8.450  
##  Max.   :10.600   Max.   :9.200   Max.   :9.000   Max.   :9.000  
##       CFMG            DECI            PREP            FAMI      
##  Min.   :5.400   Min.   :5.700   Min.   :4.800   Min.   :5.100  
##  1st Qu.:7.000   1st Qu.:7.100   1st Qu.:6.900   1st Qu.:6.950  
##  Median :7.600   Median :7.700   Median :7.700   Median :7.600  
##  Mean   :7.479   Mean   :7.565   Mean   :7.467   Mean   :7.488  
##  3rd Qu.:8.050   3rd Qu.:8.150   3rd Qu.:8.200   3rd Qu.:8.250  
##  Max.   :8.700   Max.   :8.800   Max.   :9.100   Max.   :9.100  
##       ORAL            WRIT            PHYS            RTEN      
##  Min.   :4.700   Min.   :4.900   Min.   :4.700   Min.   :4.800  
##  1st Qu.:6.850   1st Qu.:6.900   1st Qu.:7.700   1st Qu.:7.150  
##  Median :7.500   Median :7.600   Median :8.100   Median :7.800  
##  Mean   :7.293   Mean   :7.384   Mean   :7.935   Mean   :7.602  
##  3rd Qu.:8.000   3rd Qu.:8.050   3rd Qu.:8.500   3rd Qu.:8.250  
##  Max.   :8.900   Max.   :9.000   Max.   :9.100   Max.   :9.200
head(USJudgeRatings)
##                CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS RTEN
## AARONSON,L.H.   5.7  7.9  7.7  7.3  7.1  7.4  7.1  7.1  7.1  7.0  8.3  7.8
## ALEXANDER,J.M.  6.8  8.9  8.8  8.5  7.8  8.1  8.0  8.0  7.8  7.9  8.5  8.7
## ARMENTANO,A.J.  7.2  8.1  7.8  7.8  7.5  7.6  7.5  7.5  7.3  7.4  7.9  7.8
## BERDON,R.I.     6.8  8.8  8.5  8.8  8.3  8.5  8.7  8.7  8.4  8.5  8.8  8.7
## BRACKEN,J.J.    7.3  6.4  4.3  6.5  6.0  6.2  5.7  5.7  5.1  5.3  5.5  4.8
## BURNS,E.B.      6.2  8.8  8.7  8.5  7.9  8.0  8.1  8.0  8.0  8.0  8.6  8.6
boxplot(USJudgeRatings)

目的: 寻找变量的线性组合能够反映较好的反映对法官的评价

S 型分析

根据协方差矩阵的分析,称为S型分析。

pc1<-princomp(USJudgeRatings[,-1],cor = F)
screeplot(pc1,main = "scree plot1")

summary(pc1)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3      Comp.4
## Standard deviation     2.9944327 0.6120146 0.47007234 0.272428814
## Proportion of Variance 0.9230246 0.0385574 0.02274645 0.007639945
## Cumulative Proportion  0.9230246 0.9615820 0.98432843 0.991968375
##                           Comp.5      Comp.6      Comp.7       Comp.8
## Standard deviation     0.1721164 0.128908444 0.115736813 0.0897566807
## Proportion of Variance 0.0030495 0.001710594 0.001378882 0.0008293116
## Cumulative Proportion  0.9950179 0.996728469 0.998107351 0.9989366627
##                              Comp.9      Comp.10      Comp.11
## Standard deviation     0.0726040584 0.0561008908 0.0437152302
## Proportion of Variance 0.0005426327 0.0003239841 0.0001967205
## Cumulative Proportion  0.9994792954 0.9998032795 1.0000000000
pc1$loadings[,1]
##       INTG       DMNR       DILG       CFMG       DECI       PREP 
## -0.2347066 -0.3476493 -0.2868055 -0.2721066 -0.2533598 -0.3091311 
##       FAMI       ORAL       WRIT       PHYS       RTEN 
## -0.3051024 -0.3319622 -0.3139564 -0.2775534 -0.3593179
pc1
## Call:
## princomp(x = USJudgeRatings[, -1], cor = F)
## 
## Standard deviations:
##     Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6 
## 2.99443271 0.61201461 0.47007234 0.27242881 0.17211639 0.12890844 
##     Comp.7     Comp.8     Comp.9    Comp.10    Comp.11 
## 0.11573681 0.08975668 0.07260406 0.05610089 0.04371523 
## 
##  11  variables and  43 observations.

自己编程

cov1<-cov(USJudgeRatings[,-1])
eg1<-eigen(cov1,symmetric=T)
eg1$vectors[,1]
##  [1] -0.2347066 -0.3476493 -0.2868055 -0.2721066 -0.2533598 -0.3091311
##  [7] -0.3051024 -0.3319622 -0.3139564 -0.2775534 -0.3593179
eg1$values[1]/sum(eg1$values)
## [1] 0.9230246

R 型分析

根据相关系数矩阵的分析,称为R型分析。

pc2<-princomp(USJudgeRatings[,-1],cor = T)
screeplot(pc2,main = "scree plot2")

summary(pc2)
## Importance of components:
##                           Comp.1     Comp.2     Comp.3      Comp.4
## Standard deviation     3.1833029 0.65163561 0.50525195 0.302163952
## Proportion of Variance 0.9212198 0.03860263 0.02320723 0.008300278
## Cumulative Proportion  0.9212198 0.95982240 0.98302963 0.991329907
##                             Comp.5      Comp.6    Comp.7       Comp.8
## Standard deviation     0.193132512 0.140575118 0.1359209 0.0910572348
## Proportion of Variance 0.003390924 0.001796488 0.0016795 0.0007537655
## Cumulative Proportion  0.994720832 0.996517319 0.9981968 0.9989505847
##                              Comp.9      Comp.10      Comp.11
## Standard deviation     0.0780507762 0.0579730055 0.0457250007
## Proportion of Variance 0.0005538112 0.0003055336 0.0001900705
## Cumulative Proportion  0.9995043959 0.9998099295 1.0000000000
pc2$loadings[,1]
##       INTG       DMNR       DILG       CFMG       DECI       PREP 
## -0.2885122 -0.2868395 -0.3043623 -0.3026194 -0.3019234 -0.3094144 
##       FAMI       ORAL       WRIT       PHYS       RTEN 
## -0.3066761 -0.3127088 -0.3110520 -0.2807447 -0.3097836
pc2
## Call:
## princomp(x = USJudgeRatings[, -1], cor = T)
## 
## Standard deviations:
##     Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6 
## 3.18330291 0.65163561 0.50525195 0.30216395 0.19313251 0.14057512 
##     Comp.7     Comp.8     Comp.9    Comp.10    Comp.11 
## 0.13592093 0.09105723 0.07805078 0.05797301 0.04572500 
## 
##  11  variables and  43 observations.

自己编程

cov2<-cor(USJudgeRatings[,-1])
eg2<-eigen(cov2,symmetric=T)
eg2
## $values
##  [1] 10.133417426  0.424628969  0.255279532  0.091303054  0.037300167
##  [6]  0.019761364  0.018474499  0.008291420  0.006091924  0.003360869
## [11]  0.002090776
## 
## $vectors
##             [,1]          [,2]         [,3]        [,4]        [,5]
##  [1,] -0.2885122  0.5744682517  0.117763148  0.08380834  0.37493974
##  [2,] -0.2868395  0.5763568072 -0.176986952  0.23977262 -0.39860809
##  [3,] -0.3043623 -0.1385605824  0.334740068  0.26555601  0.59149417
##  [4,] -0.3026194 -0.3100115588  0.019545609  0.47773553 -0.08202695
##  [5,] -0.3019234 -0.3364674872  0.054443551  0.38036525 -0.39888902
##  [6,] -0.3094144 -0.1252540296  0.229233996 -0.20132809  0.08469611
##  [7,] -0.3066761 -0.1228593988  0.227525865 -0.52405105 -0.09943784
##  [8,] -0.3127088  0.0052082558 -0.005507203 -0.22936834 -0.14642044
##  [9,] -0.3110520 -0.0002999784  0.148245297 -0.31656247 -0.23702291
## [10,] -0.2807447 -0.2347983520 -0.820161360 -0.15475146  0.29791670
## [11,] -0.3097836  0.1527808928 -0.201053522  0.01114254  0.03729716
##              [,6]         [,7]         [,8]         [,9]       [,10]
##  [1,] -0.50952871 -0.229705308 -0.284903977  0.145484887 -0.10273495
##  [2,]  0.51407811  0.167067325 -0.169286228 -0.005467441  0.10539158
##  [3,]  0.29806148  0.367529033  0.004789352 -0.354685540  0.02389188
##  [4,]  0.10089374 -0.722336184  0.035844452 -0.026425045  0.20704699
##  [5,] -0.44826185  0.452351620 -0.199576677  0.150276288 -0.13826398
##  [6,]  0.33583565 -0.006823921  0.068955312  0.717150217 -0.25188457
##  [7,] -0.03818923 -0.002372688 -0.222092249  0.060538415  0.54400573
##  [8,]  0.01945629 -0.163555968  0.274475348 -0.252450059 -0.66684780
##  [9,] -0.07288963 -0.060729628 -0.099198510 -0.492809116 -0.01152927
## [10,]  0.03755338  0.042123360 -0.272363503 -0.001096901 -0.03061736
## [11,] -0.23409315  0.159967574  0.797241835  0.071824676  0.33262222
##               [,11]
##  [1,] -0.0006869163
##  [2,] -0.0764809505
##  [3,] -0.0735829555
##  [4,] -0.0131126895
##  [5,] -0.0422633237
##  [6,]  0.3049299442
##  [7,] -0.4518559528
##  [8,] -0.4660731103
##  [9,]  0.6804727629
## [10,]  0.0487848868
## [11,]  0.0835119540
eg2$values[1]/sum(eg2$values)
## [1] 0.9212198

思考和讨论

  • 使用协方差阵与相关系数矩阵得到的主成份是否一致?

练习

尝试对iris data 作主成份分析

help(iris)

因子分析

概述

  • 因子分析(Factor Analysis,FA)是指研究从变量群中提取共性因子的统计技术。
  • 最早由英国心理学家C.E.斯皮尔曼提出。他发现学生的各科成绩之间存在着一定的相关性,一科成绩好的学生,往往其他各科成绩也比较好,从而推想是否存在某些潜在的共性因子,或称某些一般智力条件影响着学生的学习成绩。
  • 因子分析可在许多变量中找出隐藏的具有代表性的因子。将相同本质的变量归入一个因子,可减少变量的数目,还可检验变量间关系的假设。

因子分析一般可以分为:

  • 探索性因子分析(Explorey Factor Analysis)
  • 验证性因子分析 (Confirmatory Factor Analysis)

比较因子分析和主成份分析

主成份分析与因子分析 模型如下: 此处输入图片的描述 \(F_i\)是第\(i\)个公共因子 \(\epsilon_i\)是第\(i\)个特殊因子 \(l_{ij}\)是第第\(j\)个公共因子的第\(i\)个荷载因子

写成矩阵形式: 此处输入图片的描述 下面考虑: 此处输入图片的描述 对上式左右同时取期望: 此处输入图片的描述 模型假定: 此处输入图片的描述 方差结构: 此处输入图片的描述 因子分析是方差协方差分析的一种方法。

目的:

  • 求公共因子
  • 求荷载因子

下面说明上述的荷载因子并不是唯一的: 此处输入图片的描述 荷载矩阵在任意一个正交矩阵的作用下都会不会改变方差结构: 此处输入图片的描述

选择公因子数目:

  • 碎石图平行分析
  • 主观

求公因子的方法:

  • 主成份法
  • 极大似然法
  • 最小残差法
  • 加权最小二乘法
  • R提供了6种不同的方法(stats提供了一种,包psych提供另外五种)

求荷载因子:

  • 最大方差法 varimax
  • 斜交旋转 promax
  • 最大分位数 quartimax
  • bentlerT
  • R(包psych) 提供了15种不同的方法

因子分析实例

112个人参与了六个测验,包括非语言的普通智力测验(general)、画图测验(picture)、积木图案测验(blocks)、迷津测验(maze)、阅读测验(reading)和词汇测验(vocab)。我们如何用一组较少的、潜在的心理学因素来解释参与者的测验得分呢?

help(ability.cov)
#install.packages("psych")
library(psych)
covariances <- ability.cov$cov
# convert covariances to correlations
correlations <- cov2cor(covariances)
correlations
##           general   picture    blocks      maze   reading     vocab
## general 1.0000000 0.4662649 0.5516632 0.3403250 0.5764799 0.5144058
## picture 0.4662649 1.0000000 0.5724364 0.1930992 0.2629229 0.2392766
## blocks  0.5516632 0.5724364 1.0000000 0.4450901 0.3540252 0.3564715
## maze    0.3403250 0.1930992 0.4450901 1.0000000 0.1839645 0.2188370
## reading 0.5764799 0.2629229 0.3540252 0.1839645 1.0000000 0.7913779
## vocab   0.5144058 0.2392766 0.3564715 0.2188370 0.7913779 1.0000000
  • 碎石图
  • 平行分析
scree(correlations,factors = F)

#win.graph(width = 12,height = 9,pointsize = 8)
fa.parallel(correlations, n.obs = 112, fa = "both", main = "Scree plots with parallel analysis")

## Parallel analysis suggests that the number of factors =  2  and the number of components =  1

建议选择两个公共因子

没有旋转的因子分析,选择主成份法:

fa <- fa(correlations, nfactors = 2, rotate = "none", fm = "pa")
fa
## Factor Analysis using method =  pa
## Call: fa(r = correlations, nfactors = 2, rotate = "none", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##          PA1   PA2   h2    u2 com
## general 0.75  0.07 0.57 0.432 1.0
## picture 0.52  0.32 0.38 0.623 1.7
## blocks  0.75  0.52 0.83 0.166 1.8
## maze    0.39  0.22 0.20 0.798 1.6
## reading 0.81 -0.51 0.91 0.089 1.7
## vocab   0.73 -0.39 0.69 0.313 1.5
## 
##                        PA1  PA2
## SS loadings           2.75 0.83
## Proportion Var        0.46 0.14
## Cumulative Var        0.46 0.60
## Proportion Explained  0.77 0.23
## Cumulative Proportion 0.77 1.00
## 
## Mean item complexity =  1.5
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.48
## The degrees of freedom for the model are 4  and the objective function was  0.07 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                 PA1  PA2
## Correlation of scores with factors             0.96 0.92
## Multiple R square of scores with factors       0.93 0.84
## Minimum correlation of possible factor scores  0.86 0.68

最大方差法的旋转:

fa.varimax <- fa(correlations, nfactors = 2, rotate = "varimax", fm = "pa")
fa.varimax
## Factor Analysis using method =  pa
## Call: fa(r = correlations, nfactors = 2, rotate = "varimax", fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
##          PA1  PA2   h2    u2 com
## general 0.49 0.57 0.57 0.432 2.0
## picture 0.16 0.59 0.38 0.623 1.1
## blocks  0.18 0.89 0.83 0.166 1.1
## maze    0.13 0.43 0.20 0.798 1.2
## reading 0.93 0.20 0.91 0.089 1.1
## vocab   0.80 0.23 0.69 0.313 1.2
## 
##                        PA1  PA2
## SS loadings           1.83 1.75
## Proportion Var        0.30 0.29
## Cumulative Var        0.30 0.60
## Proportion Explained  0.51 0.49
## Cumulative Proportion 0.51 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.48
## The degrees of freedom for the model are 4  and the objective function was  0.07 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                 PA1  PA2
## Correlation of scores with factors             0.96 0.92
## Multiple R square of scores with factors       0.91 0.85
## Minimum correlation of possible factor scores  0.82 0.71

利用斜交旋转提取因子:

fa.promax <- fa(correlations, nfactors = 2, rotate = "promax", fm = "pa")
## Warning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate =
## rotate, : A Heywood case was detected. Examine the loadings carefully.
fa.promax
## Factor Analysis using method =  pa
## Call: fa(r = correlations, nfactors = 2, rotate = "promax", fm = "pa")
## 
##  Warning: A Heywood case was detected. 
## Standardized loadings (pattern matrix) based upon correlation matrix
##           PA1   PA2   h2    u2 com
## general  0.36  0.49 0.57 0.432 1.8
## picture -0.04  0.64 0.38 0.623 1.0
## blocks  -0.12  0.98 0.83 0.166 1.0
## maze    -0.01  0.45 0.20 0.798 1.0
## reading  1.01 -0.11 0.91 0.089 1.0
## vocab    0.84 -0.02 0.69 0.313 1.0
## 
##                        PA1  PA2
## SS loadings           1.82 1.76
## Proportion Var        0.30 0.29
## Cumulative Var        0.30 0.60
## Proportion Explained  0.51 0.49
## Cumulative Proportion 0.51 1.00
## 
##  With factor correlations of 
##      PA1  PA2
## PA1 1.00 0.57
## PA2 0.57 1.00
## 
## Mean item complexity =  1.2
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  2.48
## The degrees of freedom for the model are 4  and the objective function was  0.07 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.06 
## 
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                 PA1  PA2
## Correlation of scores with factors             0.97 0.94
## Multiple R square of scores with factors       0.93 0.89
## Minimum correlation of possible factor scores  0.86 0.77
# Calculate factor loading matrix

fsm <- function(oblique) {
  if (class(oblique)[2]=="fa" & is.null(oblique$Phi)) {
    warning("Object doesn't look like oblique EFA")
  } else {
    P <- unclass(oblique$loading)
    F <- P %*% oblique$Phi
    colnames(F) <- c("PA1", "PA2")
    return(F)
  }
}

fsm(fa.promax)
##               PA1       PA2
## general 0.6398556 0.6927493
## picture 0.3250348 0.6133638
## blocks  0.4365629 0.9075015
## maze    0.2525385 0.4496097
## reading 0.9503302 0.4720320
## vocab   0.8285707 0.4586943
factor.plot(fa.promax, labels =rownames(fa.promax$loadings))

fa.diagram(fa.promax, simple = FALSE)

主成份分析和因子分析的异同

  • 因子分析中是把变量表示成各因子的线性组合,而主成分分析中则是把主成分表示成个变量的线性组合。

  • 主成分分析的重点在于解释个变量的总方差,而因子分析则把重点放在解释各变量之间的协方差。

  • 主成分分析中不需要有假设(assumptions),因子分析则需要一些假设。因子分析的假设包括:各个共同因子之间不相关,特殊因子(specific factor)之间也不相关,共同因子和特殊因子之间也不相关。

  • 主成分分析中,当给定的协方差矩阵或者相关矩阵的特征值是唯一的时候,的主成分一般是独特的;而因子分析中因子不是独特的,可以旋转得到不同的因子。

  • 在因子分析中,因子个数需要分析者指定,而指定的因子数量不同而结果不同。在主成分分析中,成分的数量是一定的,一般有几个变量就有几个主成分。和主成分分析相比,由于因子分析可以使用旋转技术帮助解释因子,在解释方面更加有优势。大致说来,当需要寻找潜在的因子,并对这些因子进行解释的时候,更加倾向于使用因子分析,并且借助旋转技术帮助更好解释。而如果想把现有的变量变成少数几个新的变量(新的变量几乎带有原来所有变量的信息)来进入后续的分析,则可以使用主成分分析。

因子分析练习

d <- read.csv("http://statstudy.github.io/data/simCog.csv")
head(d)
##    Knowledge OralExpression  Deduction MentalRotation Visualization
## 1 -0.1937224     -0.3194100 -0.8310500     -0.2354839   -0.00874325
## 2 -0.2468147      0.8653481  0.6461765      1.5121300    0.35422918
## 3 -0.7808090      0.1591759 -1.4743426     -0.8502143   -1.27522722
## 4 -1.4049924     -0.7473637 -0.5223564      0.2194446   -1.04352583
## 5 -0.4138702     -0.7118456  0.4500109     -0.2184520    0.03457733
## 6 -0.4649210      0.7466816 -0.5005259      0.5053426    0.32853127
##    Vocabulary   Analogies Quantitative PatternRecognition
## 1  0.03621763 -0.82876798  -0.03681655        -0.87668865
## 2  0.39317066  0.43736748  -0.02804369         0.99727512
## 3 -0.90394664  0.58931766   0.25639006        -0.86685970
## 4 -1.11491005 -0.52233132  -1.51637252        -0.95695461
## 5  0.52532648  0.74736073  -0.67334222        -0.07362392
## 6 -1.70224923  0.02511076   1.81908129         0.01534438

 

返回课程主页