Here I have temperature time series panel data and I intend to run piecewise regression or cubic spline regression for it. So first I quickly looked into piecewise regression concepts and its basic implementation in R in
SO, got an initial idea how to proceed with my workflow. In my first attempt, I tried to run spline regression by using
splines::ns function in
splines package, but I didn’t get right bar plot. For me, using baseline regression, or piecewise regression or spline regression could work.
Based on the post about piecewise regression in
SO, I understand that I need to find out appropriate breakpoint or knots before running piecewise regression, which could render possible trend line based on data. I adopted the solution of piecewise but still not efficient and can’t get my desired plot.
Here is the general picture of my panel data specification: at the first row shown below are my dependent variables which presented in natural log terms and independent variables: average temperature, total precipitation and 11 temperature bins and each bin-width (AKA, bin’s window) is 3-degree Celsius. (<-6, -6~-3,-3~0,…>21).
Here is the reproducible data that simulated with actual temperature time series panel data:
dat= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009, region= rep(c('Berlin','Stuttgart','Böblingen','Wartburgkreis','Eisenach'), each=30), ln_gdp_percapita=rep(sample.int(40, 30), 5),ln_gva_agr_perworker=rep(sample.int(45, 30), 5), temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5), bin_1=rep(sample.int(32, 30), 5), bin_2=rep(sample.int(34, 30), 5), bin_3=rep(sample.int(36, 30), 5), bin_4=rep(sample.int(38, 30), 5), bin_5=rep(sample.int(40, 30), 5), bin_6=rep(sample.int(42, 30), 5), bin_7=rep(sample.int(44, 30), 5),bin_8=rep(sample.int(46, 30), 5), bin_9=rep(sample.int(48, 30), 5), bin_10=rep(sample.int(50, 30), 5), bin_11=rep(sample.int(52, 30), 5))
notes that each bin has equally divided temperature interval except its extreme temperature value, so each bin gives the number of days that fall in respective temperature interval.
Basically, I want to fit spline regression on my data (let’s say, choose one dependent variable such as ln_gdp_percapita, and multiple independent variables such as bin_1,bin_2,…, bin_11).
my attempt to run spline regression:
Here is what I did so far to run spline regression on temperature time series data:
fit_sp <- smooth.spline(dat$bin_6, ln_gdp_percapita, nknots = 4) pred <- stats:::predict.smooth.spline(fit_sp, dat$bin6)$ln_gdp_percapita summary(pred)
but this is not what I want to do. So I also tried as follow:
sp2=lm(dat$ln_gva_agr_perworker ~ splines::ns(x = c(dat$bin_6),df = 2 ,knots =c(3)), data = dat) df_=mutate(dat, smooth=fitted(sp2)) stats::predict(sp2) ggplot(df_, aes(dat_$bin_6, df_$ln_gva_agr_perworker)) + geom_line(aes(df_$bin_6, df_$smooth))
but it didn’t render the plot that I want to get. My understanding about piecewise regression is to observe its breaking point and run regression it’s left or right observation that next to breaking point, which gives us a gradual linear trend. But now I didn’t get such result. Perhaps, I try to focus more on searching efficient workflow.
In general, I want to see how agriculture or industry sectors respond to daily temperature. So running a simple regression on temperature bin panel data is not sufficient to reach a conclusion. I believe restricted spline regression/smoothing spline or baseline regression might produce better estimation.
desired regression output metrics: (this one is inspired by related paper’s appendix.)
Here is the general picture of my desired regression output metric:
b_hi b b_low st error t_stat p-value bin1 (-23.87:-18.44) -0.0129 -0.0306 -0.0483 0.0090257 -3.39 0.001 bin2 (-18.44:-13.02) -0.0050 -0.0096 -0.0141 0.0023336 -4.1 0 bin3 (-13.02:-7.59) -0.0040 -0.0057 -0.0075 0.0008904 -6.44 0 bin4 (-7.59:-2.17) 0.0030 0.0021 0.0011 0.000492 4.23 0 bin5 (-2.17:3.26) -0.0007 -0.0012 -0.0018 0.0002781 -4.48 0 bin6 (3.26:8.69) 0.0000 0.0000 0.0000 bin7 (8.69:14.11) 0.0008 0.0001 -0.0005 0.0003502 0.41 0.681 bin8 (14.11:19.54) 0.0010 0.0000 -0.0010 0.0005107 0.06 0.956 bin9 (19.54:24.96) 0.0028 0.0016 0.0005 0.0005737 2.85 0.004 bin10 (24.96:30.39) -0.0031 -0.0057 -0.0083 0.0013075 -4.36 0
desired scatter plot:
Here is the desired scatter plot that I want to achieve (this is just simulated scatter plot that inspired by related paper’ figure):
in this plot, black point line is estimated regression (either baseline or restricted spline regression) coefficient, and dot blue line is 95% confidence interval based on clustered standard errors.
How do I accomplish my desired output with minimal code and efficiently? Could someone point me in the right direction? Any more thoughts? Thanks in advance!