#StackBounty: #clustering #pca Interpretation of PCA in relation to Clustering Analysis

Bounty: 50

I have a dataset with hundreds of customers that have about 30 characteristics. One technique used to reduce dimensionality is PCA. I understand the underlying premise but I am unsure how to interpret the results for my clustering analysis (e.g. K-means algorithm).

To better ask my question, I will divide this question into smaller inquiries that leads me confused with how PCA and clustering analysis can be used for customer segmentation.

  • Assumption 1: With 30 characteristics, I can have 30 Principal components? After transforming my dataset, using the elbow method I realize that the first 4 components represent ~90% of my dataset’s variance.

  • Q1: What do the values mean under each column ? What is PC1? Is it the equivalent of Column1 of my dataset (i.e the first feature/variable)?

  • Assumption 2: When I apply a type of cluster algorithm (e.g. K-Means) over the 4 PCs, I can see about 3 different clusters. Great.

  • Q2: What do these clusters represent? What are the characteristics that been used to properly segment them ? What is the
    x and y axes represent? How can I use the final cluster result
    concretely by applying it with new data, for example: New customer X
    is part of cluster 2(e.g. Valuable customer) based on this and that
    data.

Essentially, what I am trying to do is, to properly explain to a layperson how I went from this dataset to justifying that there are 3 clusters and how they can be implemented in a real world application, for example, in marketing.

Thank you, for your patience and understanding


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.