Suppose a website records all information related to visits including gender, device, time, etc. When a new impression happens we store it and we want to predict when this person will re-visit the website. For example, suppose 10 people have visited our website and we want to predict the time that one of these 10 people will revisit our website again (the first person among these 10 people). This problem can be modeled as a regression but the challenge is that as time goes on, the number of people increases and I am wondering how to convert all information to features in a regression model. What approach do you recommend to model it? Is it possible to model it with conventional models like OLS?
An example of data can be as follows
Records ID gender device time_of_first_visit time_of_second_visit 1 1 male phone 1:25 PM 1:55 PM 2 2 male tablet 1:47 PM 2:15 PM 3 3 female PC-Win 2:15 PM 4 4 male Mac 2:37 PM 5 5 female phone 2:43 PM
Now, suppose that it is 3:00 PM. Now, we how the first two customers behaved and we can use them as our training data. Our goal is to predict which of these 3 people (3, 4, and 5) will re-visit our website first and at what time. If we want to do the same at the time 3:15 PM, it is possible that a new person visits our website and the total number of records for prediction purposes will be 4.
The challenge is that because arrivals are random, at each time, a different number of people can potentially revisit the website. My model should answer who and when will revisit the website again.