#StackBounty: #self-study #statistical-significance #chi-squared #experiment-design How to infer on a human intelligence task about sel…

Bounty: 50

I have a human intelligence task test which tries to get the best option from a set of options tested against a control group.

The test consist of:

  • Suppose we have a control group of options, lets say 10.
  • Each option is related to a common characteristic or concept, but they are different from each other and ordered, descending from the most relevant one.
  • We focus on one option and call it the control variant.
  • We build n variants of this option and call them test variants.
  • Each variant is put in place of the control variant to build a test group.
  • The test is run asking a human to select the best option from a group which describes our characteristic or concept. The group (including the control group) that is shown to the test taker is selected uniformly at random. We run the test 500 times, ensuring that one test taker can solve the test just once.

enter image description here

I want to measure which among the control variant and the test variants is better to the test takers. For instance, we could select the variant which would be selected more times with respect to its group options, i.e., if a group is shown to the test takers k_i (with i = 1,2,...,n) times (ideally the same value 500/n), select the variant with higher selection rate with respect its own group options in the k_i samples.

I’m not an expert, but I clearly see that the way I run the test introduces some problems. For example, what clearly establish that this approach would lead to the best variant according to the test takers criteria? How much confident am I on this? Clearly the shown groups won’t be perfectly selected uniformly. Will the samples amount lead to a significant result?

Some friend told me that I would be interested on reading about chi-square tests, and sent me this article: https://www.lunametrics.com/blog/2014/07/01/statistical-significance-test, but I don’t completely understand if this would be applied to the problem.

How can I infer what is the best option according to the described test with high confidence? Can you share with me some concepts, articles or books to learn about this kind of problems?

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.