For a personal project, I’ve built a dataset of hockey players’ statistics over the time. I am looking for insights as to how I should model my predictive model (lol). The model would be used to predict how much points a player would be expected to produce with regards to his previous performances.
To keep it simple, let’s keep only the three most “important” columns for my dataset (there is a little more features than that but I don’t think they are necessary for the problem) :
PlayerId | Points | Year
Now, I have tried to use the machine learning algorithms I know but :
- The data behaves kind of like a time series. Let’s say I have 10k players, well those players have stats over the years (sometimes from season 2005 to 2017, others from 2009-2010, well you get the point). Considering this relation between rows (For example playerId 1, Year 2005 and playerId 1, Year 2006), I can’t use most of the algorithms I know because this logic would be thrown out the window and I think it’s an important one.
- Considering the data is related over time for some rows, I don’t think I can really model it as an unique time series. There are small time series into the dataset, per players, but I certainly don’t have enough data to treat it like such (With one row per player per year, at max I’d have 15 rows maybe for a player, which isn’t enough to build a good prediction).
Considering these two points, I’m pretty much stuck without a solution.
I’ve thought about merging all the rows in one, so I’d have :
PlayerId | Points2005 | Points2006 | etc.. but it doesn’t make much sense since we loose the notion of time.
I also considered that I could make a predictive model for all the players individually then use the weights I’d find to make another predictive model, but I’m very unsure as to how this would turn out.
I’m just looking for a small tip to push me in the right direction, whether it’s pure statistics related or a machine learning algorithm.