I have a dataset with hundreds of customers that have about 30 characteristics. One technique used to reduce dimensionality is PCA. I understand the underlying premise but I am unsure how to interpret the results for my clustering analysis (e.g. K-means algorithm).
To better ask my question, I will divide this question into smaller inquiries that leads me confused with how PCA and clustering analysis can be used for customer segmentation.
Assumption 1: With 30 characteristics, I can have 30 Principal components? After transforming my dataset, using the elbow method I realize that the first 4 components represent ~90% of my dataset’s variance.
Q1: What do the values mean under each column ? What is PC1? Is it the equivalent of Column1 of my dataset (i.e the first feature/variable)?
Assumption 2: When I apply a type of cluster algorithm (e.g. K-Means) over the 4 PCs, I can see about 3 different clusters. Great.
Q2: What do these clusters represent? What are the characteristics that been used to properly segment them ? What is the
x and y axes represent? How can I use the final cluster result
concretely by applying it with new data, for example: New customer X
is part of cluster 2(e.g. Valuable customer) based on this and that
Essentially, what I am trying to do is, to properly explain to a layperson how I went from this dataset to justifying that there are 3 clusters and how they can be implemented in a real world application, for example, in marketing.
Thank you, for your patience and understanding