#StackBounty: #hypothesis-testing #distributions #mathematical-statistics #matlab #distance Are there statistical methods for determini…

Bounty: 50

So I know that t-tests and ANOVAs are used to determine if the means of 2 normally distributed random variables are significantly different, and the p-values give a statistical confidence level of that result.

However, for the problem I’m working on, I am wondering if there is some extension of this idea to determine if the mean of a pair of random variables is “significantly” located inside of some region on the cartesian plane.

EDIT: Thanks to a helpful comment, I now know that this is called a “composite hypothesis”. To get help in setting this up, I want to provide details about my specific problem.

Consider the cartesian plane x-y, where the domain of x stretches from [0, 45] and the domain of y stretches from [-30,30]. Let’s then also say we have random variables $rx$ & $ry$, both of which are normally distributed random variables with a mean and std of 20 and 5.5 for $rx$ and -2.5 and 2 for $ry$ respectively. $rx$ & $ry$ are paired samples, and thus when plotted in the plane x-y, form a cloud of points.

Furthermore, consider a region in the x-y plane that is formed by the lines defined by equations 1 & 2 respectively: $y = 0$ and $y = -1.85*(x_2 – 15)$, where $x_2$ stretches from [15,45]. Separately, consider the line defined in equation 3: $y = 0.5 * x$, where here $x$ encompasses the entire domain being considered: [0, 45].

I have 2 goals that I hope composite hypothesis testing will solve:

1) I want to determine, with a reasonable level of statistical confidence, that the center of the cloud of points formed by the pair $(rx, ry)$ are “significantly” inside the region formed by equations 1 and 2.

2) I similarly want to determine that the center of the cloud of points formed by the pair $(rx, ry)$ is “significantly” far away from the locus of points formed by equation 3.

Edit 2: I am adding a figure that might help with visualization, as well as the code I wrote to generate it:

clear all;
close all;
home;

%define the domain we will display the data over
x_domain = [0:1:45];
y_domain = [-30:1:30];

%define number of samples for rx and ry
num_samples = 26;

%define variables rx and ry
rx = 20 + 5.5 .* randn(num_samples,1);
ry = -2.5 + 2 .* randn(num_samples,1);

%deine equations 1-3
eq1 = x_domain * 0;
eq2 = -1.85 .* [(15:x_domain(end)) - 15];
eq3 = x_domain ./ 2;

%start plotting
figure; hold on;
Lw1 = 1.5;
Lw2 = 2.5;
marker_size = 8;

%plot the pair rx, ry
plot(rx, ry, '.', 'markersize' ,marker_size,'color','k', 'displayname', 'points of rx & ry');

%plot the grand mean of the data with a star
plot(mean(rx), mean(ry),'p','markersize',12,'linewidth',Lw1,'markerfacecolor', 'r', 'markeredgecolor', 'k', ...
    'displayname', 'grand mean');

legend('show');

%display the lines that form the region (eq1 and 2)
plot(x_domain, eq1, 'color', 'k', 'linewidth', Lw2, 'displayname', 'equation 1');
plot([15:x_domain(end)], eq2, 'color', 0.5 * [1,1,1], 'linewidth', Lw2, 'displayname', 'equation 2');

%plot the line corresponding to eq3
plot(x_domain,eq3,'--','color','k', 'linewidth', Lw2, 'displayname', 'equation 3');

h1 = gca;
h1.YTick = [-30,-15,0,15,30];
h1.YTickLabel = {'-30','-15','0', '15', '30'};
h1.XTick = [0, 15, 30, 45];
h1.XTickLabel = {'0', '15', '30','45'};

xlabel('X');
ylabel('Y');
set(gca,'Fontsize',12);

xlim(x_domain(end)*[0, 1]); 
ylim(y_domain(end)*[-1,1]); 

axis square;

Result:
data and equations visualized


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.