New to AI/ML. My understanding of feature scaling is that its a set of techniques used to counteract the effects of different features having different scales/ranges (which then causes models to incorrectly weight them more/less).
The two most common techniques here that I keep reading about are normalization (adjusting your feature values between 0 and 1) and standardization (adjusting your feature values to have a 0 mean and standard deviation of 1).
From what I can gather, normalization seems to work better for when your data is non-Gaussian/”Bell Curve”, whereas standardization is better when it is Gaussian. But nowhere can I find a decent explanation as to why this is the case!
Why does your data distribution affect the efficacy of your feature scaling technique? Why is normalization good for non-Gaussian whereas standardization is? Any edge cases where you’d use standardization on non-Gaussian data? Any other major techniques besides these two?
For instance, I found this excellent paper on characterizing datasets by various distributions. So I’m wondering if there are methods for feature scaling when the data is, say, geometrically distributed, or when its exponentially distributed, etc. And if so, what are they?!