Bounty: 50
I’m somewhat familiar with parametric estimation using MLE in the context of fitting the parameters of a distribution given a sample. Is there a way of generalizing this approach to unnormalized models (for instance, neural networks)? Naïvely maximizing the predicted log-likelihood would simply lead to the model predicting high values everywhere.