I am working in the following kind of classification problem: I have to classify every instance as class A or class B using many images of the instance. That is, every training example has not one image (which is the usual thing in image classification), but many images, and the number of images for every training instance is not fixed. That is, instance 1 can have 3 images, and according to these images we have to classify it as A or B, and instance 2 can have instead 5 images.
As any machine learning problem, I am provided with many labelled images and I have to build a classifier.
Although ideas are also welcome, I am looking for a documented way to attack this kind of problem (Kaggles, papers or books, mainly).
My main idea was the following: train a model $f$ that given one image gives a probability of that image being of class A. Then, for every training instance, evaluate $f$ in every image of the instance and compute statistics (aggregate) of the distribution of these probabilities, as the mean, median, maximum and minimum. Then, train a model $g$ that has as inputs these aggregates and use the composition of $f$, aggregates and $g$ as the final model. This idea is a bit simple so I am looking for something better.