#StackBounty: #machine-learning #statistical-significance #t-test #p-value How to decide if means of two sets are statistically signifi…

Bounty: 50

I have a data set consisting of some number of pairs of real numbers. For example:

(1.2, 3.4), (3.2, 2.7), ..., (4.2, 1.0)

or

(x1, y1), (x2, y2), ..., (xn, yn)

I want to know if the second variable depends on the first one (it is known in advance that if there is a dependency, it is very weak, so it is hard to detect).

I split the data set into two parts using the first number (Xs). Then I use the mean of Ys for the first and the second sub-sets as “predictions”. If find such a split that the squared deviation between the predictions and real values of Ys is minimal. Basically I do what is done by decision trees.

Now I wont to know if the found split and the corresponding difference between the two means is significant. I could use some standard test to check if the means of two sets are statistically significantly different but, I think, it would be incorrect because we did the split that maximise this difference. What would be the way to count for that?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.