Wednesday, November 9, 2011

PCA vs. EFA

A principal component is a linear combination of weighted observed variables. Principal components are uncorrelated and orthogonal. 

Principal component analysis minimizes the sum of the squared perpendicular distances to the axis of the principal component while least squares regression minimizes the sum of the squared distances perpendicular to the x axis (not perpendicular to the fitted line) (Truxillo, 2003). Principal component scores are actual scores. 

Principal Component Analysis (PCA) 

- Is a variable reduction technique  
- Is used when variables are highly correlated  
- Reduces the number of observed variables to a smaller number of principal components which account for most of the variance of the observed variables 
- Is a large sample procedure

Eigenvectors are the weights in a linear transformation when computing principal component scores. 

Eigenvalues indicate the amount of variance explained by each principal component or each factor. 

Exploratory Factor Analysis (EFA) 

A latent construct can be measured indirectly by determining its influence to responses on measured variables. A latent construct could is also referred to as a factor, underlying construct, or unobserved variable. Unique factors refer to unreliability due to measurement error and variation in the data. Factor scores are estimates of underlying latent constructs. 

An observed variable “loads” on a factors if it is highly correlated with the factor, has an eigenvector of greater magnitude on that factor. 

Communality is the variance in observed variables accounted for by a common factors. Communality is more relevant to EFA than PCA (Hatcher, 1994). 

- Is a variable reduction technique which identifies the number of latent constructs and the underlying factor structure of a set of variables  
- Hypothesizes an underlying construct, a variable not measured directly 
- Estimates factors which influence responses on observed variables  
- Allows you to describe and identify the number of latent constructs (factors) 
- Includes unique factors, error due to unreliability in measurement 
- Traditionally has been used to explore the possible underlying factor structure of a set of measured variables without imposing any preconceived structure on the outcome.

Feature selection approaches try to find a subset of the original variables (also called features or attributes). There are three strategies:
filter (information gain)
wapper(each guided by accuracy)
embedded(features are selected to be added or be removed while building the model based on the prediction errors)

The main linear technique for dimensionality reduction, principal component analysis, performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. In practice, the covariance (and sometimes the correlation) matrix of the data is constructed and the eigen vectors on this matrix are computed. The eigenvectors that correspond to the largest eigenvalues (the principal components) can now be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behavior of the system. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors.


No comments:

Post a Comment

Blog Archive