I have created a random forest classification model in skicit-learn, but I am unsure how to finalize my forecast.
I have built the model and it is showing good results on the testing data. I get a mean accuracy of 85%. Predicting whether the stock price will go up or down. I used data from Yahoo finance consisting of open, high, low, close, and volume. From there I worked out some technical indicators such as the RSI, ROC, stochastic oscillators (fast and slow), macd, on balance volume, and the 200 days moving average and used these as features (independent variables) in the random forest classifier. I created another column, showing 1 when the price went up and 0 when the price went down. This column was used as the dependent variable. (the thing I want to predict)
The thing I am trying to find out now is how can I run the forecast into the unknown future? For now, I have split my data into training and testing, trained the model on the training dataset, and then used the predict function on the testing dataset. The model performs well and after a little more tweaking it can be used.
But how? I can’t seem to find anywhere in the sklearn random forest documentation about how to actually run the forecast for the future (not on the testing data), say for example for the next 10 days from the last day of data. I hope you understand what I mean. Below is my code.
Here is my code:
X_train2, X_test2, y_train2, y_test2 = train_test_split(data2.drop('prediction',axis=1),data2.prediction,test_size=0.02) from sklearn.ensemble import RandomForestClassifier model1 = RandomForestClassifier(random_state=13) model1.fit(X_train2,y_train2) predicted = model1.predict(X_test2) model1.score(X_test2, y_test2) from sklearn.metrics import roc_auc_score probabilities = model1.predict_proba(X_test) probabilities roc_auc_score(y_test2, probabilities[:,1]) from sklearn.metrics import confusion_matrix confusion_matrix(y_test2, predicted)