*Bounty: 50*

*Bounty: 50*

There are 1000 students and 100 teachers. Each teacher is given the answer scripts of randomly selected 100 students. So in total 10,000 answer scripts are judged.

Now this is sort of panel data, but lots of missingness. If i want to find which teachers are lenient in grading and which are stricter, what technique can i use.

Imputation shall not work as missingness is around 90%.

The basic way to work this – that comes to me seems to be –

- Define a data structure for assessment:id, teacher id, student id, marks
- Group by students and for each group rank the teachers based on marks
- Compute normalized average ranks for each teachers, normalisation required to accommodate different numbers of assessments done by different teachers
- Then rank the teachers based on this averages or do clustering based on this average ranks to split them in to strict and lenient

How to estimate standard error? What is underlying probability distribution?

Is there a standard methodology to handle such a problem? what if instead of marks, we have categorical variables (grades) like A,B,C,D.