Bounty: 100
Problem: I have carried out a series of biological experiments where the output of the experiment is a N x N matrix of counts. I then created a custom distance metric that takes in two rows of counts and calculates the ‘difference’ between them (I will call this difference metric D). I calculated D for all pairwise comparisons and now have an array of difference metrics D called D_array.
My assumption based on biology is that the majority of D in D_array represent that there is no significant difference between the two rows of counts and only the >= 95% interval of D metrics actually represent real differences between two rows of counts. Let us assume that this is true, even if it doesn’t make sense.
So this means if D_array = [0, 1, 2, 3, 4 … 99] (100 metrics) then only a D score of 9599 are actually representative of a real difference between two rows of counts.
Note: D_array is not representative of my data. My actual data actually has a distribution of values like this (black line represents the mean): https://imgur.com/usvvIgB
Given D_array I want to be able to determine whether a newly calculated distance value D’ is "significant" based on my previous data: the distribution of my D_array. Ideally, I would like to provide some sort of metric of ‘significance’ such as a pvalue. By significance I mean the probability / significance of having gotten a result as extreme as D’.
After a bit of reading, I found that I can use bootstrapping to calculate a 95% confidence interval for D_array, and then essentially ask if D’ is outside of the 95% CI range. However, I am unsure if there is a way to determine how significant having obtained a value of D’ is based on D_array.
My questions are:

Does asking if D’ is outside of the 95% CI of bootstrapped D_array in order to determine whether D’ represents a ‘real’ difference between two rows of counts make sense?

Given D’ and D_array how can I determine the significance of having gotten a value as extreme as D’ as a result. I have seen bootstrapping used to calculate Pvalues, but this usually requires the mean of two different distributions which I do not have in this case.

Is there a better way to determine whether a new observation is ‘significantly’ different from my prior distribution of ‘null’ (D_array) data. If so, how?