#StackBounty: #regression #time-series #forecasting #residuals Predicting the Residuals of a Forecast Model

Bounty: 50

I have just read a paper [1] in which the authors try to forecast risk of some variable (earnings in this case) by deriving dispersion measures via forecasting quantiles of the respective variable, i.e. forecasted risk equals forecasted dispersion measures. They state that alternatively:

" […] one could capture conditional variance (dispersion) in future earnings by regressing the squared (or absolute) value of the residuals from an earnings forecasting model on predictor
variables"

To me, this is a completely new approach that seems to make only limited sense. If a set of predictor variables turns out to significantly impact the above-mentioned residuals, why not just include them in the original model? How does one "capture conditional variance", i.e. model the shape of the future distribution of a variable, via this approach?

Maybe this is a common practice, which I have not yet heard of. I would be grateful for any comments on this.

[1] Konstantinidi and Pope (2016) Forecasting Risk in Earnings, Contemporary Accounting Research Vol. 33 (2), pp. 487-525.


Get this bounty!!!

#StackBounty: #regression #forecasting #data-transformation #prediction #logarithm Modelling the logarithm of a response

Bounty: 50

My response variable is positive and I decided to model the logarithm of the response.

Some of the values are zero. For this reason I modelled $Z = log(Y + 0.1)$. When I transform back, some of my predictions are negative.

I.e. $Y = exp(Z) – 0.1$ is negative for some predictions. Am I missing something here or is it expected that some predictions may be negative when transforming back to the raw scale?

Perhaps I should be considering $Y’ = Y + 0.1$. I can then model $Z = log(Y’)$. When transforming back to the raw scale, $Y’$ will be positive. Perhaps the only guarantee is that $Y’$ will be positive?

One other thing that I tried (which I know is not ideal) is to replace all zero values with a small number $varepsilon$. This way the transformation was $Z = log(Y)$ and hence the predictions on the raw scale, $exp(Z) geq 0$.

Edit: Consider that the output being modelled is rain in ml/kg per hour. It is possible to observe 0 ml of rain in a given hour.

Consider the following:

y=c(3,1.9,1.2,0.5,0.3,0.2,0.1,0.05,0.03,0.01,0)
y = y+0.01
plot(y, ylab = "y", xlab = "Time", type = "o")
plot(log(y), ylab = expression(log(y)), xlab = "Time", type = "o")

$log(y)$ is linear and could be modelled using a linear regression. The slope is steep and so it would not be unusual for a prediction to predict lower than $log(0.01)$. What could be done in this case?


Get this bounty!!!

#StackBounty: #python #tensorflow #forecasting How do I validate this Kalman model for estimation of undocumeted covid cases?

Bounty: 100

Tensorflow recently made a tutorial titled Estimation of undocumented SARS-CoV2 cases. It replicates 6th March 2020 paper by Li et al titled Substantial Undocumented Infection Facilitates the Rapid Dissemination of Novel Coronavirus (SARS-CoV2). It is a compartment based SEIR model where the population is represented by a state with compartments Susceptible, Exposed, Undocumented Infectious and Documented Infectious.

I am trying to test the validity of this model by training it with data for 30 days, and forecast the state for next 15 days. I did find the optimal parameters of the model and current state, but I am not sure how I can use it to predict future states. I have been attempting this for almost a week now. The programming style in the notebook is quite unfamiliar to me, hence, I am struggling to figure out how exactly to do it.

I request someone with more experience to go through the notebook once and give me suggestions about how to predict the states.

Thank you!


Get this bounty!!!

#StackBounty: #regression #mixed-model #forecasting #references #random-effects-model Predictions and forecasting with mixed-effects mo…

Bounty: 50

I am not sure I fully understand how mixed-effects models (such as mixed-effects PK/PD models) can be used for forecasting.

Some notations

Let $p in mathbb{N}$ with $p geq 2$. We assume that for each individual $i in lbrace 1,ldots,p rbrace$, we have $k_i in mathbb{N}^{ast}$ scalar observations $(y_{i,j}){1 leq j leq k_i}$ obtained at times $(t{i,j}){1 leq j leq k_i}$. Therefore, for each individual, the observations are $left( y{i,j}, t_{i,j} right)_{1 leq j leq k_i}$. We also assume the following model:

$$ y_{i,j} = fleft( t_{i,j}, b_i, theta right) + varepsilon_{i,j} $$

where $theta$ is a vector of parameters which contains fixed effects and variance-covariance parameters ; $b_i$ is a vector of individual random effects ; $f$ is sometimes called the structural model ; $varepsilon_{i,j}$ is the observation noise. We assume that:

$$ b_i sim mathcal{N}left( 0, mathbf{D} right), quad text{and} quad varepsilon_i = begin{bmatrix} varepsilon_{i,1} \ vdots \ varepsilon_{i, k_i} end{bmatrix} sim mathcal{N}left( 0, mathbf{Sigma} right). $$

The individual random effects $b_i$ are assumed i.i.d. and independent from $varepsilon_i$.

The question

Given $left( y_{i,j}, t_{i,j} right)_{substack{1 leq i leq p \ 1 leq j leq k_i}}$, one can obtain an estimate $hat{theta}$ of the model parameters $theta$ (which contain the unique coefficients in $mathbf{D}$ and $mathbf{Sigma}$) by maximizing the model likelihood. This can be done, for instance, using stochastic versions of the EM algorithm (see link above).

Assume that $hat{theta}$ is available.

If we are given some observations $y_{s}^{mathrm{new}}$ for a new individual $s notin lbrace 1, ldots, p rbrace$, its individual random effects are estimated by:

$$ widehat{b_s} = mathop{mathrm{argmax}} limits_{b_s} pleft( b_s mid y_{s}^{mathrm{new}}, hat{theta} right) $$

where $pleft( cdot mid y_{s}^{mathrm{new}}, hat{theta} right)$ is the posterior distribution of the random effects given the new observations $y_{s}^{mathrm{new}}$ and the point estimate of the model parameters $hat{theta}$. Thanks to Bayes’ theorem, this is equivalent to maximizing the product "likelihood $times$ prior:

$$ widehat{b_s} = mathop{mathrm{argmax}} limits_{b_s} pleft( y_{s}^{mathrm{new}} mid b_{s}, hat{theta} right) pleft( b_{s} mid hat{theta} right). $$

Now, if $t , longmapsto , f(t, cdot, cdot)$ is a continuous function of time, we may call it a growth curve. It describes the evolution of the measurements with time. Let $i_{0} in lbrace 1, ldots, p rbrace$ and $t$ such that $t_{i_{0},1} < ldots < t_{i_{0},k_i} < t$.

How can we use this mixed-effects model to predict the most likely value $y_{i_{0}}^{ast}$ for individual $i_{0}$ at time $t$? This relates to forecasting since we want to predict the measurement value at a future time.

Naively, I would do as follows. Given $left( y_{i,j}, t_{i,j} right){substack{1 leq i leq p \ 1 leq j leq k_i}}$, I would estimate $hat{theta}$ (we estimate the model parameters using all the data including the past observations for individual $i{0}$). Then I would estimate $widehat{b_{i_{0}}}$ as described above. Eventually, I would say that:

$$ y_{i_{0}}^{ast} = fleft( t, widehat{b_{i_{0}}}, hat{theta} right). $$

If this is right, I don’t see how I would prove it mathematically. Still, I’m feeling like I’m missing something because this predicted value $y_{i_{0}}^{ast}$ does not take into account the noise distribution. Also, I do not see how I would be able to estimate CIs for $y_{i_{0}}^{ast}$ with this.

In a Bayesian setting (with a prior distribution on $theta$), would I need to use the posterior predictive distribution (see this post and these notes)? From what I understand, if $y_{i_{0}}$ denotes the vector of the past observations for individual $i_{0}$, this posterior predictive distribution is given by:

$$ pleft( y_{i_{0}}^{ast} mid y_{i_{0}} right) = int_{Theta} pleft( y_{i_{0}}^{ast} mid theta, y_{i_{0}} right) pleft( theta mid y_{i_{0}} right) , dtheta. $$

However, I’m not sure it applies here and I’m not sure where the random effects come in.

Any reference, explanation, hint,… is welcome ! 🙂


Get this bounty!!!

#StackBounty: #python #forecasting #model-selection #ensemble #gradient How to calculate gradient for custom objective function in xgbo…

Bounty: 100

I’m trying to build an implementation of the Feature-based Forecast Model Averaging approach in Python (https://robjhyndman.com/papers/fforma.pdf). However, I’m sort of stuck on computing the gradient and hessian for my custom objective function.

The idea in the paper is as follows: there is an array contribution_to_error that contains for each of the time series and each of the models that you use, the average prediction error of that model (the Mean Absolute Percentage Error). That is, element $x_{i,j}$ contains the average error of model $j$ for time series $i$. An example of the input is shown below, the contribution_to_error contains the values from the dataframe.

enter image description here

Then it uses a softmax transform to map the errors to model weights. The loss function then is the weights times the original errors (the weighted average of the errors).

def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
    '''
    Compute...
    '''
    #labels of the elements in the training set
    y = dtrain.get_label().astype(int)        
    n_train = len(y)
    self.y_obj = y

    preds = np.reshape(predt,
                      self.contribution_to_error[y, :].shape,
                      order='F')

    preds_transformed = softmax(preds, axis=1)

    weighted_avg_loss_func = (preds_transformed*self.contribution_to_error[y, :]).sum(axis=1).reshape((n_train, 1))

    grad = preds_transformed*(self.contribution_to_error[y, :] - weighted_avg_loss_func)
    hess = (self.contribution_to_error[y,:])*preds_transformed*(1.0-preds_transformed) - grad*preds_transformed

    return grad.flatten('F'), hess.flatten('F')

My question is the following. This paper looks at average prediction errors for the models (considering for instance a 12 period ahead horizon). I would like to change it to looking at the forecasting errors of the individual periods, and optimizing over that. This way, you could benefit from the information that some models underestimate a specific forecast and other models overestimate a forecast, which could perhaps ‘cancel’ out. So the input would then be (ds runs from 1 to 12, the individual periods):

enter image description here

Now how do I need to change the gradient and Hessian if I use these individual errors? Including the fact that some errors are actually negative instead of only looking at absolute values.

My idea is the following:

# Objective function for lgb
def fforma_objective(self, predt: np.ndarray, dtrain) -> (np.ndarray, np.ndarray):
    '''
    Compute...
    '''
    #labels of the elements in the training set
    y = dtrain.get_label().astype(int)        
    n_train = len(y)
    self.y_obj = y

    preds = np.reshape(predt,
                      self.contribution_to_error[y, :].shape,
                      order='F')

    preds_transformed = softmax(preds, axis=1)

    #Changed to use all individual errors.
    preds_transformed_new = np.repeat(preds_transformed, 12, axis = 0)

    #The np.abs here makes sure that after weighting for the individual periods, you do look at absolute errors. Otherwise grouping might show incorrect mean errors.
    weighted_avg_loss_func = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1)).groupby('unique_id').mean().values.reshape((n_train, 1)) 

    weighted_avg_loss_func_ungrouped = np.abs((preds_transformed_new*self.errors_full.loc[y].reindex(y, level = 0)).sum(axis = 1))

    grad = preds_transformed*((np.abs(self.errors_full.loc[y].reindex(y, level = 0)) - np.array([weighted_avg_loss_func_ungrouped.values]).T).groupby('unique_id')[self.errors_full.columns].mean().values)

    hess = (np.abs(self.errors_full.loc[y].reindex(y, level=0))*preds_transformed_new*(1-preds_transformed_new)).groupby('unique_id')[self.errors_full.columns].mean().values - grad*preds_transformed

    return grad.flatten('F'), hess.flatten('F')

Any feedback would be much appreciated


Get this bounty!!!

#StackBounty: #time-series #forecasting #references #intermittent-time-series Books or articles to study different forecasting techniqu…

Bounty: 100

I am doing a project to forecast demand for an automotive firm making spare parts. Using average demand interval (ADI) and square of the Coefficient of Variation (CV2), I have categorized product SKUs into smooth, erratic, lumpy and intermittent. There are ARIMA, exponential smoothing techniques for smooth demand. However, I am unable to find enough literature for lumpy and intermittent demand. There are 3 methods that have been mentioned in several blogs for intermittent –

1) Croston’s method
2) Adjusted Croston’s method
3) Bootstrap method

I am reading about these 3 methods. However, I haven’t found any literature on lumpy demand forecasting.

Can some please suggest me some good books or articles to understand the different forecasting techniques for lumpy and intermittent demand (can have state space and neural networks also)? If that book or article explains the above 3 methods, it will be an added bonus for me. Thanks.


Get this bounty!!!

#StackBounty: #forecasting #econometrics #latent-variable #state-space-models Forecast market development through state space models

Bounty: 50

Suppose I would like to estimate the following model

begin{align}
y_t &= Lambda f_t + Bx_t + u_t\
f_t &= A_1f_{t-1} + cdots + A_pf
{t-p} + eta_t & eta_t sim N(0, I)\
u_t &= C_1u
{t-1} + cdots + C_q u_{t-q} + epsilon_t & epsilon_t sim N(0, Sigma)
end{align}

and suppose I want to forecast housing prices. I have a number of exogenous variables but also want to include a latent variable that serves as a proxy for the ‘market development’. So in the model above, $f_t$ would be the market development.

My question is, can you estimate this model for a univariate time series $y_t$ (and is this a correct way to approach the problem)? Because for instance https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_dfm_coincident.html states that

“Factor models generally try to find a small number of unobserved
“factors” that influence a substantial portion of the variation in a
larger number of observed variables”


Get this bounty!!!

#StackBounty: #r #forecasting #xgboost XGBoost one-step ahead forecast

Bounty: 50

I have trained and cross-validated an xgboost classification algorithm in R using the following code:

xgb_params <- list("objective" = "binary:logistic", 
                   "eval_metric" = "error",
                   min_child_weight=1, 
                   subsample=1, 
                   colsample_bytree= 0.6, 
                   eta = 0.05, 
                   gamma = 1, 
                   max_depth = 5
)
watchlist <- list(train = train_matrix, test = test_matrix) 
xgb_mod <- xgb.train(params = xgb_params, 
                     data = train_matrix,
                     nrounds = 800,
                     watchlist = watchlist, 
                     seed = 333)

xgb_mod

Now I want to do one-step-ahead forecasting.

However, using the following:

xgbpred_prob <- predict(xgb_mod, newdata = test_matrix)

it is required some new data to be stored into a matrix. Instead, I wish to do forecasting like the following code would do for an ARIMA model:

fit <- arima(df, order = c(0,1,1)) 
predict(fit, n.ahead = 6)

It is like if the first part of the job which I have done was to validate the booster, instead now I wish to put the model into production mode and use it on a daily basis for daily forecasting.

Do you have any idea how could I achieve that?


Get this bounty!!!

#StackBounty: #forecasting #data-imputation #data-cleaning #mase Do you clean the data before calculating MASE (Mean Absolute Scaled Er…

Bounty: 50

The denominator in the MASE calculation for seasonal data is the MAE of the seasonal naive forecast calculated in-sample.

Is it common to do imputation before calculating the seasonal naive MAE or do you calculate the seasonal naive MAE with no data cleaning? Would I apply the same imputation I would normally apply if I were trying to use the seasonal naive as an actual forecast? This would include things like imputing missing values. Or would I leave missing values as NA?


Get this bounty!!!

#StackBounty: #forecasting #mase Do you clean the data before calculating MASE (Mean Absolute Scaled Error)

Bounty: 50

The denominator in the MASE calculation for seasonal data is the MAE of the seasonal naive forecast calculated in-sample.

Is it common to do imputation before calculating the seasonal naive MAE or do you calculate the seasonal naive MAE with no data cleaning? Would I apply the same imputation I would normally apply if I were trying to use the seasonal naive as an actual forecast? This would include things like imputing missing values. Or would I leave missing values as NA?


Get this bounty!!!