I have a binary(1/0) classification task. I am trying to find $$p(y = 1 | X)$$ where $$X$$ is the vector of input variables and $$y$$ is the binary output label.

Suppose that for some records the output labels ($$y$$) are missing.

Scenario 1: Labels are Missing Completely at Random (MCAR)

While estimating the model, is there a difference between discarding records without labels vs. using Expectation Maximization to estimate missing labels?

Scenario 2: Labels are Missing at Random (MAR) conditional on $$X$$

While estimating the model, is there a difference between discarding records without labels vs. using Expectation Maximization to estimate missing labels?

I am asking the difference in terms of model bias, and estimation efficiency.

