## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #regression #hypothesis-testing #anova #experiment-design #methodology Arguments/Advantages of Additive Model Constructio…

### Bounty: 50

Assume I perform a psychological experiment with a number of manipulations, each hypothesized to influence the dependent variable. For instance, we perform a mixed-design short-term memory experiment where:

• `DV` = Number of letters recalled
• `IV1` = Number of letters
• `IV2` = Number of distractions
• `IV3` = Delay time between presentation of stimuli and recall
• `IV4` = Whether memory is being stored internally (biologically) or externally (i.e., pen and paper)

Assuming the presence of interactions between terms, is there any argument for constructing a number of analytical model (e.x., `~IV1*IV2`, `~IV1*IV2*IV4`, `~IV1*IV2*IV3*IV4`) in order to best understand the phenomena? This is in the context of performing successive mixed-design ANOVA’s to come to some multiple conclusions regarding the experiment. I have a basic understanding of regression/multivariate regression and recognize the folly of including unnecessary variables – but if they’re all hypothesized to exert some effect, isn’t the final model the most sound?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!