*Bounty: 100*

*Bounty: 100*

I’ve read the paper Variational Graph Auto-Encoders which aims to use auto-encoders for a link prediction task, such as predicting links in a citation network of publications. The loss is based on binary cross-entropy between the actual adjacency matrix $A$ of a citation network and the reconstructed adjacency matrix by the auto-encoder. The positive class (ones in $A$) corresponds to observed citations and are easy to obtain. However, the negative class (zeros in $A$) is huge, because there are much more publications that don’t cite each other. Therefore, the authors suggest two approaches:

For very sparse $A$, it can be beneficial to re-weight terms with $A_{ij} = 1$ in $mathcal{L}$ or alternatively sub-sample terms with $A_{ij} = 0$.

In their implementation, the perform re-weighting, but I’m more interested in sub-sampling examples from the negative class. I tried randomly sampling non-observed edges, but the model performs worse compared to re-weighting.

My questions is if there are more sophisticated sampling schemes for link prediction tasks, maybe related to what word2vec does?