#StackBounty: #time-series #hypothesis-testing #autocorrelation How to efficiently do rank-sum tests on autocorrelated time-series?

Bounty: 50

There is an observable $x$. It was first measured for time $T$ under condition $A$, then for time $T$ under condition $B$. Measurements were performed with small time intervals $Delta t$. It is known $x$ is autocorrelated, namely, future values depend on the past values. It may be assumed that $x$ is effectively indepentent of its past beyond certain known time interval $tau$. It is known that $tau ll T$. The goal is to test if the expected value of $x$ is the same under both conditions or not. The question is how to best perform such a test. For this particular question I am interested in non-parametric methods, to be able to deal with the cases where the explicit model of how $x$ depends on its past is unknown.

I frequently see this problem solved by use of rank-sum test, however, the pre-processing varies:

  • Idea 1: Use all datapoints for testing. Obviously bad, because test assumes i.i.d.
  • Idea 2: Average time over conditions. Coherent, but extremely wasteful.
  • Idea 3: Select timepoints at interval $tau$ from each other. Coherent, but again very wasteful, as we will not even use most of our datapoints in the analysis
  • Idea 4: Split data into time-bins of length $tau$, average over each bin. Ok-ish, although consecutive bins are still correlated
  • Idea 5: Same as 4, but also omit every second bin. This is probably the best I can come up with from the top of my head.


Get this bounty!!!

#StackBounty: #time-series #hypothesis-testing #autocorrelation #moving-average Breusch-Godfrey test on residuals from an MA(q) model

Bounty: 50

Consider testing for presence of autocorrelation of lag order up to $h$ in the residuals from a regression model
$$y_t=mathbf x_t^top beta+u_t
$$

where $mathbf x_t$ may or may not include lags of $y_t$. The Breusch-Godfrey test would employ an auxiliary regression
$$
hat u_t=mathbf x_t^top gamma+varphi_1hat u_{t-1}+dots+varphi_hhat u_{t-h}+varepsilon_t
$$

and derive its test statistic from there. Instead of the regression model, consider an MA(q) model.

How do I carry out the Breusch-Godfrey test on residuals from an MA(q) model? Concretely:

  1. How do I construct the auxiliary regression?
    Will it be $hat u_t=varphi_1hat u_{t-1}+dots+varphi_shat u_{t-s}+varepsilon_t$ where $s$ somehow depends on $h$ and $q$?
  2. How do I construct the test statistic?
  3. What is its asymptotic distribution under the null hypothesis of zero autocorrelation?


Get this bounty!!!

#StackBounty: #time-series #autocorrelation #autoregressive #seasonality Day-of-week effects on regression coefficients in autoregressi…

Bounty: 50

I have a timeseries (sampled daily, weekdays only) whose volatility clearly shows dependency on day of week. In particular the standard deviation of the differenced series $Delta y_t$ is smallest on Mondays and peaks on Thursdays.

I have considered GARCH-style models for the volatility, with the respective dummy variables for day of week. However, I am not interested in the volatility of the errors per se, but rather how the mean equation is affected by the day of week. For example if I fit an AR(1) model to $Delta y$ I observe that its residuals $varepsilon_t$ on Wednesdays are correlated with $Delta y_{t-1}$.

In addition, if I assume $Delta y_t = phi Delta y_{t-1} + varepsilon_t$ but estimate $phi$ via OLS regression for each weekday separately I get the following for each day of week:

monday: $phi = 0.68$ (SE = 0.02)
tuesday: $phi = 0.76$ (SE = 0.04)
wednesday: $phi = 1.03$ (SE = 0.02)
thursday: $phi = 0.90$ (SE = 0.02)
friday: $phi = 0.80$ (SE = 0.018)

Correct me if I’m wrong but to me these effects cannot be captured by a GARCH model for the errors. In light of the errors being correlated with $Delta y_{t-1}$ I have considered a model which looks something like this:
$$
Delta y_t = phi Delta y_{t-1} + varepsilon_t\
varepsilon_t = gamma 1_{lbrace t text{ is Wed} rbrace} Delta y_{t-1} + epsilon_t
$$

which can be written as
$$
Delta y_t = (phi + gamma 1_{lbrace t text{ is Wed} rbrace}) Delta y_{t-1} + epsilon_t
$$

but it is unclear to me how to estimate a standard ARIMA-type model in this case.

Are there any known models that deal with these kind of effects? Or maybe I’m missing something and that this is a routinely modeled in a typical ARIMA + GARCH setup?


Get this bounty!!!

#StackBounty: #time-series #autocorrelation #autoregressive Day-of-week effects on regression coefficients in autoregressive model?

Bounty: 50

I have a timeseries (sampled daily, weekdays only) whose volatility clearly shows dependency on day of week. In particular the standard deviation of the differenced series $Delta y_t$ is smallest on Mondays and peaks on Thursdays.

I have considered GARCH-style models for the volatility, with the respective dummy variables for day of week. However, I am not interested in the volatility of the errors per se, but rather how the mean equation is affected by the day of week. For example if I fit an AR(1) model to $Delta y$ I observe that its residuals $varepsilon_t$ on Wednesdays are correlated with $Delta y_{t-1}$.

In addition, if I assume $Delta y_t = phi Delta y_{t-1} + varepsilon_t$ but estimate $phi$ via OLS regression for each weekday separately I get the following for each day of week:

monday: $phi = 0.68$ (SE = 0.02)
tuesday: $phi = 0.76$ (SE = 0.04)
wednesday: $phi = 1.03$ (SE = 0.02)
thursday: $phi = 0.90$ (SE = 0.02)
friday: $phi = 0.80$ (SE = 0.018)

Correct me if I’m wrong but to me these effects cannot be captured by a GARCH model for the errors. In light of the errors being correlated with $Delta y_{t-1}$ I have considered a model which looks something like this:
$$
Delta y_t = phi Delta y_{t-1} + varepsilon_t\
varepsilon_t = gamma 1_{lbrace t text{ is Wed} rbrace} Delta y_{t-1} + epsilon_t
$$

which can be written as
$$
Delta y_t = (phi + gamma 1_{lbrace t text{ is Wed} rbrace}) Delta y_{t-1} + epsilon_t
$$

but it is unclear to me how to estimate a standard ARIMA-type model in this case.

Are there any known models that deal with these kind of effects? Or maybe I’m missing something and that this is a routinely modeled in a typical ARIMA + GARCH setup?


Get this bounty!!!

#StackBounty: #time-series #hypothesis-testing #statistical-significance #autocorrelation #f-test Significance Testing of Difference in…

Bounty: 100

I have two consecutive time series of different length that both vary around some common mean, but exhibit different variances, see exemplary figures below. Both show quite substantial autocorrelation. There is a large gap between the end of the first and the beginning of the second, so we can ignore correlations between the two series.

How can I test if the variation in the second time series is significantly smaller than the variation in the first time series? Since the autocorrelation is quite large and, hence, samples are not independent, I cannot apply an F-test. Which statistical test can I use? Thanks!

edit:

To avoid confusion, I am interested in volatility overall, not in the (Gaussian) noise that remains in case I fit some function (like prophet, or some sinusoids, some MLP) to both time series to remove the autocorrelation.

First Time Series:

First time series

Second Time Series:

enter image description here


Get this bounty!!!

#StackBounty: #time-series #autocorrelation #covariance #stochastic-processes #brownian Time-series Auto-Covariance vs. Stochastic Proc…

Bounty: 50

My background is more on the Stochastic processes side, and I am new to Time series analysis. I would like to ask about estimating a time-series auto-covariance:

$$ lambda(u):=frac{1}{t}sum_{t}(Y_{t+u}-bar{Y})(Y_{t}-bar{Y}) $$

When I think of the covariance of Standard Brownian motion $W(t)$ with itself, i.e. $Cov(W_s,W_t)=min(s,t)$, the way I interpret the covariance is as follows: Since $mathbb{E}[W_s|W_0]=mathbb{E}[W_t|W_0]=0$, the Covariance is a measure of how "often" one would "expect" a specific Brownian motion path at time $s$ to be on the same side of the x-axis as as the same Brownian motion path at time t.

It’s perhaps easier to think of correlation rather than covariance, since $Corr(W_s,W_t)=frac{min(s,t)}{sqrt(s) sqrt(t)}$: with the correlation, one can see that the closer $s$ and $t$ are together, the closer the Corr should get to 1, as indeed one would expect intuitively.

The main point here is that at each time $s$ and $t$, the Brownian motion will have a distribution of paths: so if I were to "estimate" the covariance from sampling, I’d want to simulate many paths (or observe many paths), and then I would fix $t$ and $s=t-h$ ($h$ can be negative), and I would compute:

$$ lambda(s,t):=frac{1}{i}sum_{i}(W_{i,t}-bar{W_i})(W_{i,t-h}-bar{W_i}) $$

For each Brownian path $i$.

With the time-series approach, it seems to be the case that we "generate" just one path (or observe just one path) and then estimate the auto-covariance from just that one path by shifting throught time.

Hopefully I am making my point clear: my question is on the intuitive interpretation of the estimation methods.


Get this bounty!!!

#StackBounty: #time-series #autocorrelation Time-series Auto-Covariance vs. Stochastic Process Auto-Covariance

Bounty: 50

My background is more on the Stochastic processes side, and I am new to Time series analysis. I would like to ask about estimating a time-series auto-covariance:

$$ lambda(u):=frac{1}{t}sum_{t}(Y_{t+u}-bar{Y})(Y_{t}-bar{Y}) $$

When I think of the covariance of Standard Brownian motion $W(t)$ with itself, i.e. $Cov(W_s,W_t)=min(s,t)$, the way I interpret the covariance is as follows: Since $mathbb{E}[W_s|W_0]=mathbb{E}[W_t|W_0]=0$, the Covariance is a measure of how "often" one would "expect" a specific Brownian motion path at time $s$ to be on the same side of the x-axis as as the same Brownian motion path at time t.

It’s perhaps easier to think of correlation rather than covariance, since $Corr(W_s,W_t)=frac{min(s,t)}{sqrt(s) sqrt(t)}$: with the correlation, one can see that the closer $s$ and $t$ are together, the closer the Corr should get to 1, as indeed one would expect intuitively.

The main point here is that at each time $s$ and $t$, the Brownian motion will have a distribution of paths: so if I were to "estimate" the covariance from sampling, I’d want to simulate many paths (or observe many paths), and then I would fix $t$ and $s=t-h$ ($h$ can be negative), and I would compute:

$$ lambda(s,t):=frac{1}{i}sum_{i}(W_{i,t}-bar{W_i})(W_{i,t-h}-bar{W_i}) $$

For each Brownian path $i$.

With the time-series approach, it seems to be the case that we "generate" just one path (or observe just one path) and then estimate the auto-covariance from just that one path by shifting throught time.

Hopefully I am making my point clear: my question is on the intuitive interpretation of the estimation methods.


Get this bounty!!!

#StackBounty: #self-study #autocorrelation Mean square of x-component of uniformly distributed circle points

Bounty: 50

Recently I was looking at a paper where the velocity auto-correlation function
$$C(t) = langle v_x(t) v_x(0) rangle = langle costheta(t), cos theta(0) rangle$$
was being considered for a number of point particles with velocities distributed uniformly on $S^1$ at time zero (here $langle cdot rangle$ denotes an average over initial conditions). In the above, $theta(t)$ is the angle w.r.t to the horizontal at time $t$. In their plot of $C(t)$ vs. $t$ I noticed that $C(0) ne 1/2$ (there were multiple plots). However I don’t see this when trying to generate a uniform velocity distribution so I assume I am doing something wrong here.

I generate velocities uniformly on $S^1$ as follows:

    N <- 10^4                                          # number of samples
    r <- runif(N, 0, 1)                                # uniform radii in [0,1]
    theta <- runif(N, -pi, pi)                         # uniform angles
    P <- cbind(sqrt(r)*cos(theta), sqrt(r)*sin(theta)) # uniform points in the unit disk
    L <- sqrt(P[,1]*P[,1] + P[,2]*P[,2]) 
    V <- P/L 

Another method I’ve seen is:

X1 <- rnorm(N, 0, 1)
X2 <- rnorm(N, 0, 1)
R  <- sqrt(X1*X1 + X2*X2)
V  <- matrix(rbind(X1/R, X2/R), N, 2, byrow=TRUE)

Or using the pracma package:

pracma::rands(N, r=1.0, N=1.0)

Firstly, can someone confirm that these are appropriate methods for generating points uniformly on the unit circle?

In all cases mean(V[,1] * V[,1]) returns $approx 1/2$.

Moreover if $Theta sim mathcal{U}[-pi, pi]$ and $V = cos^2 Theta$ then it has the pdf $$f_V(x) = dfrac{-2}{pi} dfrac{d}{dx} (sin^{-1} sqrt{x}) = dfrac{-1}{sqrt{x(1-x)}}$$ which has mean value
$$int_0^1 x f_V(x) dx = 1/2.$$

Is this correct?

Edit:

The issue with the final calculation is that uniform points on the circle are not formed by simply taking the cosine and sine of uniformly distributed angles, so it must be incorrect..


Get this bounty!!!

#StackBounty: #r #binomial #autocorrelation #glmm #spatio-temporal Spatio-temporal autocorrelation

Bounty: 100

I have a huge data frame (300k + rows) on GPS animal positions.
I want to model the probability of the presence of chamois taking into consideration as variables: distance (from a disturbance), intensity (of the disturbance), altitude.

      ID idAnimal        date               lat   lon   alt  dist     intens    park

 1     1 animal_1        11/07/2018 12:00  45.7  6.71  2351    170       143   name2
 2     2 animal_3        11/07/2018 18:00  45.7  6.71  2371    131        71   name5
 3     3 animal_4        12/07/2018 00:00  45.7  6.70  2323     90       102   name5
 4     4 animal_1        12/07/2018 06:00  45.7  6.69  2379    119         6   name3
 5     5 animal_2        12/07/2018 12:00  45.7  6.69  2372    141       152   name5
 6     6 animal_1        12/07/2018 18:00  45.7  6.70  2364    121        25   name2
 7     7 animal_4        13/07/2018 00:00  45.7  6.70  2217    135        39   name1
 8     8 animal_2        13/07/2018 06:00  45.7  6.72  2605    137        96   name2
 9     9 animal_2        13/07/2018 12:00  45.7  6.72  2602     16       100   name1
10    10 animal_1        13/07/2018 18:00  45.7  6.71  2424     48        72   name2

I want to create a model that takes into account the spatio-temporal autocorrelation of data. I tried to build a binomial GLMM by adding fictitious points of absence, but I have no idea if this is correct. I also do not know how to take into account the autocorrelation of the data.
I was thinking of splitting up data into a list of dataframes with the following condition:
“one observation per day per animalID”.
Then I’m going to run the model on each of the created subsets.
However, I’m not sure how to get a single output from many models and (most of all) if this process can remove the problem of autocorrelation.


Get this bounty!!!