Moving beyond principal components analysis

  • nick.dale.burns

    SSCrazy

    Points: 2226

    Comments posted to this topic are about the item Moving beyond principal components analysis

  • SQLBoar

    SSC Veteran

    Points: 278

    Erm, that's because PCA is NOT really meant to be used for visualisation. Or, even for direct analysis of data, or in other words as the sole analysis output. It is usually used during the feature engineering stage of a much longer analysis, to eliminate the irrelevant or trivially important features. This may even allow you to skip a lot of unnecessary data cleaning (a benefit not to be under-estimated) and certainly can cut down on the the amount of data that you have try to generate a model from.

    So, essentially, if it is applied in the usual way you almost always "move beyond" PCA.

    This DBA says - "It depends".

  • nick.dale.burns

    SSCrazy

    Points: 2226

    SQLBoar - Thursday, September 21, 2017 5:56 AM

    Erm, that's because PCA is NOT really meant to be used for visualisation. Or, even for direct analysis of data, or in other words as the sole analysis output. It is usually used during the feature engineering stage of a much longer analysis, to eliminate the irrelevant or trivially important features. This may even allow you to skip a lot of unnecessary data cleaning (a benefit not to be under-estimated) and certainly can cut down on the the amount of data that you have try to generate a model from.

    So, essentially, if it is applied in the usual way you almost always "move beyond" PCA.

    At a superficial level, I agree with you that PCA can be used purely as a feature engineer process. Arguably, this is the approach taken in modern machine learning methodologies and it is the application most promoted by tools like Python, Julia and AzureML. In contrast, have ever wondered why statistical tools (R, SPSS and SAS) place such a large emphasis on the eigenvalues, eigenvectors, communality and tests of sphericity and significance? This is because these are the outputs required to analyse and interpret principal components.

    To lend some weight to this - here is a smattering of the (literally) thousands of papers that have been written on this subject, dating back to 1933:

    • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of educational psychology24(6), 417.
    • Jackson (2005) has a great book on applied principal components, with an entire chapter on inference. Jackson, J. E. (2005). A user's guide to principal components (Vol. 587). John Wiley & Sons.
    • Raskin, R., & Terry, H. (1988). A principal-components analysis of the Narcissistic Personality Inventory and further evidence of its construct validity. Journal of personality and social psychology54(5), 890.

    And for completeness, here are some recent paper:

    • Ballard, E., Luckenbaugh, D., Yarrington, J., Lally, N., Lener, M., Machado-Vieira, R., ... & Zarate, C. (2017). 330-A Principal Components Analysis of Depression and Anhedonia Scales: Illustrating the Heterogeneity of Depression. Biological Psychiatry81(10), S135. 
    • Mazeliauskas, A., & Teaney, D. (2016). Fluctuations of harmonic and radial flow in heavy ion collisions with principal components. Physical Review C93(2), 024913.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply