Principal Component Analysis Of PCA

1617 Words4 Pages

Principal Component Analysis
Principal Component Analysis (PCA) is a multivariate analysis performed in purpose of reducing the dimensionality of a multivariate data set in order to recognize the shape or pattern of that data set. In other words, PCA is a powerful technique for pattern recognition that attempts to explain the variance of a large set of inter-correlated variables. It indicates the association between variables, thus, reducing the dimensionality of the data set. (Helena et al, 2000; Wunderlin et al, 2001; Singh et al, 2004)
Principal components seek to transform the original variables to a new set of variables that are (1) linear combinations of the variables in the data set, (2) Uncorrelated with each other and (3) ordered according to the amount of variation of the original variables that they explain (Everitt and Hothorn 2011).
The Assumptions of PCA:
Linearity- The reduced dimension should represent the linear combination of the original variables.
The importance of mean and covariance- There is no guarantee that the directions of the maximum variance will contain good features for discrimination.
The large variances have important dynamics- PCA assumes that components with larger variances correspond to interesting dynamics and lower ones correspond to noise.

Important Terminologies for PCA:
Dimension:
In case of Principal Component Analysis, each random variable is considered as an individual dimension.

Standard Deviation:
Standard Deviation is a measure about how spreads the numbers are. It describes the dispersion of a data set from its mean. If the dispersion of the data set is higher from the mean value, then the deviation is also higher. It is expressed as the Greek letter Sigma (σ).

...

... middle of paper ...

...ferred because it produces meaningful information about each data point and where it falls within its normal distribution, plus provides a crude indicator of outliers. (Ben Etzkorn 2011).
If we do not standardize the data in case of Principal Component Analysis, the analysis result will tend to give more emphasis to the variables with higher variances. So, in that case the analysis will entirely depend on the unit of the data we used. Another important step is, if we are using the covariance matrix for the Principal component Analysis, we have to standardize the data. But if Correlation matrix is implemented for analysis, raw data can be used. Therefore, covariance matrix of the standardized data is equal to the correlation matrix of the non-standardized data.
(https://onlinecourses.science.psu.edu/stat505/node/55 )

Working Procedure:
Same date data we used.......

Open Document