I have two datasets, each dataset is a pool of samples, and each sample contains multiple observations. Data-wise, we can represent one sample as a vector of floats, and a dataset as a matrix where each row is a sample.
I would like to find a transform to project the first dataset into the “space” of the second dataset. I think I am looking for a way to estimate and then project an empirical distribution into another empirical distribution. There might be a dependency between samples of one dataset, but not between datasets.
Bonus reward if the answer also suggests a way to account for the number of samples (because there is a big dataset and the other one is much smaller, thus a distribution estimation will likely be much more approximate for the latter).
Clarification: I have (and can’t get) no apriori on the underlying distribution of any of the datasets. They are generated by a natural phenomenon, and no theory model correctly these distributions yet.