*Bounty: 50*

*Bounty: 50*

I’m analyzing a questionnaire in which people were asked about their shopping habits. Respondents were first presented with a set of three qualities a store might have and asked ($Q1_i$, $i=1,2,3$) to rate how important those are when deciding where to shop. For example one factor would be “attractive customer loyalty programs” which the respondent would rate on a Likert scale (1: very unimportant – 5: very important, I treat this as continuous for simplicity currently). Next, the respondents were given a list of five store chains, and for each chain they were asked ($Q2_{ij}$, $i=1,2,3$, $j=A,B,C,D,E$) to rate the chain according to the same factors as before, i.e. whether chain A has ‘attractive loyalty programs’ etc. (same scale). Finally, ($Q3$) the respondent indicates the chain they spend the most money at on average.

I want to come up with some kind of model: which store has which qualities and what makes people shop there. Unfortunately I don’t know much about analyzing surveys at all, so I’m looking for some references or advice. What methods would be standard in such analyses? I’m having trouble finding appropriate learning materials, because I don’t know the names of any relevant techniques. I’m not even sure how to title this post.

One thing I came up with would be to consider each factor separately and try to look for statistically significant differences in the average rating ($Q1_i$) between the mutually exclusive groups for which $Q3=A,B,…,E$. So basically running an ANOVA kind of regression for each $i=1,2,3$. Then, for instance, if I found that the average value of $Q1_1$ is significantly larger for the group with $Q3=C$, it would indicate that people who shop at chain C value factor #1 more than the other shoppers. This would lead to a ranking of the store chains for each factor induced by the regression coefficients: the chain with the largest positive deviation could be said to perform best in terms of factor #1.

Does this reasoning make sense at all? It bothers me that I can’t think of a way to sensibly include $Q2_{ij}$ in my analysis, since it constitutes most of the information gathered in the survey…

===============================

EDIT: Here is a method I came up with, using Q2. I don’t know if it’s mathematically justified though:

- Fix factor $i$.
- For every $j$, split the ratings of factor $i$ into two disjoint groups: ratings of company $j$ and ratings of all the other companies.
- Use the t-test to compare the means of the two groups. This gives me a matrix of p-values $M_{ij}$ indicating whether for factor $i$, company $j$ has a significantly different average rating than the remaining companies.

Is this approach correct?