# #StackBounty: #r #regression #non-linear-regression #piecewise How to perform piece wise/spline regression for longitudinal temperature…

### Bounty: 50

Here I have temperature time series panel data and I intend to run piecewise regression or cubic spline regression for it. So first I quickly looked into piecewise regression concepts and its basic implementation in R in `SO`, got an initial idea how to proceed with my workflow. In my first attempt, I tried to run spline regression by using `splines::ns` function in `splines` package, but I didn’t get right bar plot. For me, using baseline regression, or piecewise regression or spline regression could work.

Based on the post about piecewise regression in `SO`, I understand that I need to find out appropriate breakpoint or knots before running piecewise regression, which could render possible trend line based on data. I adopted the solution of piecewise but still not efficient and can’t get my desired plot.

Here is the general picture of my panel data specification: at the first row shown below are my dependent variables which presented in natural log terms and independent variables: average temperature, total precipitation and 11 temperature bins and each bin-width (AKA, bin’s window) is 3-degree Celsius. (<-6, -6~-3,-3~0,…>21).

reproducible example:

Here is the reproducible data that simulated with actual temperature time series panel data:

``````dat= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009,
region= rep(c('Berlin','Stuttgart','Böblingen','Wartburgkreis','Eisenach'), each=30),
ln_gdp_percapita=rep(sample.int(40, 30), 5),ln_gva_agr_perworker=rep(sample.int(45, 30), 5),
temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5),
bin_1=rep(sample.int(32, 30), 5), bin_2=rep(sample.int(34, 30), 5), bin_3=rep(sample.int(36, 30), 5),
bin_4=rep(sample.int(38, 30), 5), bin_5=rep(sample.int(40, 30), 5), bin_6=rep(sample.int(42, 30), 5),
bin_7=rep(sample.int(44, 30), 5),bin_8=rep(sample.int(46, 30), 5), bin_9=rep(sample.int(48, 30), 5),
bin_10=rep(sample.int(50, 30), 5), bin_11=rep(sample.int(52, 30), 5))
``````

update:

notes that each bin has equally divided temperature interval except its extreme temperature value, so each bin gives the number of days that fall in respective temperature interval.

Basically, I want to fit spline regression on my data (let’s say, choose one dependent variable such as ln_gdp_percapita, and multiple independent variables such as bin_1,bin_2,…, bin_11).

my attempt to run spline regression:

Here is what I did so far to run spline regression on temperature time series data:

``````fit_sp <- smooth.spline(dat\$bin_6, ln_gdp_percapita, nknots = 4)
pred <- stats:::predict.smooth.spline(fit_sp, dat\$bin6)\$ln_gdp_percapita
summary(pred)
``````

but this is not what I want to do. So I also tried as follow:

``````sp2=lm(dat\$ln_gva_agr_perworker ~ splines::ns(x = c(dat\$bin_6),df = 2 ,knots =c(3)), data = dat)
df_=mutate(dat, smooth=fitted(sp2))
stats::predict(sp2)
ggplot(df_, aes(dat_\$bin_6, df_\$ln_gva_agr_perworker)) + geom_line(aes(df_\$bin_6, df_\$smooth))
``````

but it didn’t render the plot that I want to get. My understanding about piecewise regression is to observe its breaking point and run regression it’s left or right observation that next to breaking point, which gives us a gradual linear trend. But now I didn’t get such result. Perhaps, I try to focus more on searching efficient workflow.

In general, I want to see how agriculture or industry sectors respond to daily temperature. So running a simple regression on temperature bin panel data is not sufficient to reach a conclusion. I believe restricted spline regression/smoothing spline or baseline regression might produce better estimation.

desired regression output metrics: (this one is inspired by related paper’s appendix.)

Here is the general picture of my desired regression output metric:

``````        b_hi    b   b_low   st error    t_stat  p-value
bin1    (-23.87:-18.44) -0.0129 -0.0306 -0.0483 0.0090257   -3.39   0.001
bin2    (-18.44:-13.02) -0.0050 -0.0096 -0.0141 0.0023336   -4.1    0
bin3    (-13.02:-7.59)  -0.0040 -0.0057 -0.0075 0.0008904   -6.44   0
bin4    (-7.59:-2.17)   0.0030  0.0021  0.0011  0.000492    4.23    0
bin5    (-2.17:3.26)    -0.0007 -0.0012 -0.0018 0.0002781   -4.48   0
bin6    (3.26:8.69) 0.0000  0.0000  0.0000
bin7    (8.69:14.11)    0.0008  0.0001  -0.0005 0.0003502   0.41    0.681
bin8    (14.11:19.54)   0.0010  0.0000  -0.0010 0.0005107   0.06    0.956
bin9    (19.54:24.96)   0.0028  0.0016  0.0005  0.0005737   2.85    0.004
bin10   (24.96:30.39)   -0.0031 -0.0057 -0.0083 0.0013075   -4.36   0
``````

desired scatter plot:

Here is the desired scatter plot that I want to achieve (this is just simulated scatter plot that inspired by related paper’ figure):

in this plot, black point line is estimated regression (either baseline or restricted spline regression) coefficient, and dot blue line is 95% confidence interval based on clustered standard errors.

How do I accomplish my desired output with minimal code and efficiently? Could someone point me in the right direction? Any more thoughts? Thanks in advance!

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.