#StackBounty: #missing-data #expectation-maximization #mcar MCAR, MAR and EM

Bounty: 100

I have a binary(1/0) classification task. I am trying to find $p(y = 1 | X)$ where $X$ is the vector of input variables and $y$ is the binary output label.

Suppose that for some records the output labels ($y$) are missing.

Scenario 1: Labels are Missing Completely at Random (MCAR)

While estimating the model, is there a difference between discarding records without labels vs. using Expectation Maximization to estimate missing labels?

Scenario 2: Labels are Missing at Random (MAR) conditional on $X$

While estimating the model, is there a difference between discarding records without labels vs. using Expectation Maximization to estimate missing labels?

I am asking the difference in terms of model bias, and estimation efficiency.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.