#StackBounty: #r #regression #non-linear-regression #piecewise How to perform piece wise/spline regression for longitudinal temperature…

Bounty: 50

Here I have temperature time series panel data and I intend to run piecewise regression or cubic spline regression for it. So first I quickly looked into piecewise regression concepts and its basic implementation in R in SO, got an initial idea how to proceed with my workflow. In my first attempt, I tried to run spline regression by using splines::ns function in splines package, but I didn’t get right bar plot. For me, using baseline regression, or piecewise regression or spline regression could work.

Based on the post about piecewise regression in SO, I understand that I need to find out appropriate breakpoint or knots before running piecewise regression, which could render possible trend line based on data. I adopted the solution of piecewise but still not efficient and can’t get my desired plot.

Here is the general picture of my panel data specification: at the first row shown below are my dependent variables which presented in natural log terms and independent variables: average temperature, total precipitation and 11 temperature bins and each bin-width (AKA, bin’s window) is 3-degree Celsius. (<-6, -6~-3,-3~0,…>21).

reproducible example:

Here is the reproducible data that simulated with actual temperature time series panel data:

dat= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009,
                region= rep(c('Berlin','Stuttgart','Böblingen','Wartburgkreis','Eisenach'), each=30),
                ln_gdp_percapita=rep(sample.int(40, 30), 5),ln_gva_agr_perworker=rep(sample.int(45, 30), 5),
                temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5),
                bin_1=rep(sample.int(32, 30), 5), bin_2=rep(sample.int(34, 30), 5), bin_3=rep(sample.int(36, 30), 5),
                bin_4=rep(sample.int(38, 30), 5), bin_5=rep(sample.int(40, 30), 5), bin_6=rep(sample.int(42, 30), 5),
                bin_7=rep(sample.int(44, 30), 5),bin_8=rep(sample.int(46, 30), 5), bin_9=rep(sample.int(48, 30), 5),
                bin_10=rep(sample.int(50, 30), 5), bin_11=rep(sample.int(52, 30), 5))


notes that each bin has equally divided temperature interval except its extreme temperature value, so each bin gives the number of days that fall in respective temperature interval.

Basically, I want to fit spline regression on my data (let’s say, choose one dependent variable such as ln_gdp_percapita, and multiple independent variables such as bin_1,bin_2,…, bin_11).

my attempt to run spline regression:

Here is what I did so far to run spline regression on temperature time series data:

fit_sp <- smooth.spline(dat$bin_6, ln_gdp_percapita, nknots = 4)
pred <- stats:::predict.smooth.spline(fit_sp, dat$bin6)$ln_gdp_percapita

but this is not what I want to do. So I also tried as follow:

sp2=lm(dat$ln_gva_agr_perworker ~ splines::ns(x = c(dat$bin_6),df = 2 ,knots =c(3)), data = dat)
df_=mutate(dat, smooth=fitted(sp2))
ggplot(df_, aes(dat_$bin_6, df_$ln_gva_agr_perworker)) + geom_line(aes(df_$bin_6, df_$smooth))

but it didn’t render the plot that I want to get. My understanding about piecewise regression is to observe its breaking point and run regression it’s left or right observation that next to breaking point, which gives us a gradual linear trend. But now I didn’t get such result. Perhaps, I try to focus more on searching efficient workflow.

In general, I want to see how agriculture or industry sectors respond to daily temperature. So running a simple regression on temperature bin panel data is not sufficient to reach a conclusion. I believe restricted spline regression/smoothing spline or baseline regression might produce better estimation.

desired regression output metrics: (this one is inspired by related paper’s appendix.)

Here is the general picture of my desired regression output metric:

        b_hi    b   b_low   st error    t_stat  p-value
bin1    (-23.87:-18.44) -0.0129 -0.0306 -0.0483 0.0090257   -3.39   0.001
bin2    (-18.44:-13.02) -0.0050 -0.0096 -0.0141 0.0023336   -4.1    0
bin3    (-13.02:-7.59)  -0.0040 -0.0057 -0.0075 0.0008904   -6.44   0
bin4    (-7.59:-2.17)   0.0030  0.0021  0.0011  0.000492    4.23    0
bin5    (-2.17:3.26)    -0.0007 -0.0012 -0.0018 0.0002781   -4.48   0
bin6    (3.26:8.69) 0.0000  0.0000  0.0000          
bin7    (8.69:14.11)    0.0008  0.0001  -0.0005 0.0003502   0.41    0.681
bin8    (14.11:19.54)   0.0010  0.0000  -0.0010 0.0005107   0.06    0.956
bin9    (19.54:24.96)   0.0028  0.0016  0.0005  0.0005737   2.85    0.004
bin10   (24.96:30.39)   -0.0031 -0.0057 -0.0083 0.0013075   -4.36   0

desired scatter plot:

Here is the desired scatter plot that I want to achieve (this is just simulated scatter plot that inspired by related paper’ figure):

enter image description here

in this plot, black point line is estimated regression (either baseline or restricted spline regression) coefficient, and dot blue line is 95% confidence interval based on clustered standard errors.

How do I accomplish my desired output with minimal code and efficiently? Could someone point me in the right direction? Any more thoughts? Thanks in advance!

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.