*Bounty: 50*

*Bounty: 50*

Here I have temperature time series panel data and I intend to run piecewise regression or cubic spline regression for it. So first I quickly looked into piecewise regression concepts and its basic implementation in R in `SO`

, got an initial idea how to proceed with my workflow. In my first attempt, I tried to run spline regression by using `splines::ns`

function in `splines`

package, but I didn’t get right bar plot. For me, using baseline regression, or piecewise regression or spline regression could work.

Based on the post about piecewise regression in `SO`

, I understand that I need to find out appropriate breakpoint or knots before running piecewise regression, which could render possible trend line based on data. I adopted the solution of piecewise but still not efficient and can’t get my desired plot.

Here is the general picture of my panel data specification: at the first row shown below are my dependent variables which presented in natural log terms and independent variables: average temperature, total precipitation and 11 temperature bins and each bin-width (AKA, bin’s window) is 3-degree Celsius. (<-6, -6~-3,-3~0,…>21).

**reproducible example**:

Here is the reproducible data that simulated with actual temperature time series panel data:

```
dat= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009,
region= rep(c('Berlin','Stuttgart','Böblingen','Wartburgkreis','Eisenach'), each=30),
ln_gdp_percapita=rep(sample.int(40, 30), 5),ln_gva_agr_perworker=rep(sample.int(45, 30), 5),
temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5),
bin_1=rep(sample.int(32, 30), 5), bin_2=rep(sample.int(34, 30), 5), bin_3=rep(sample.int(36, 30), 5),
bin_4=rep(sample.int(38, 30), 5), bin_5=rep(sample.int(40, 30), 5), bin_6=rep(sample.int(42, 30), 5),
bin_7=rep(sample.int(44, 30), 5),bin_8=rep(sample.int(46, 30), 5), bin_9=rep(sample.int(48, 30), 5),
bin_10=rep(sample.int(50, 30), 5), bin_11=rep(sample.int(52, 30), 5))
```

**update**:

notes that each bin has equally divided temperature interval except its extreme temperature value, so each bin gives the number of days that fall in respective temperature interval.

Basically, I want to fit spline regression on my data (let’s say, choose one dependent variable such as ln_gdp_percapita, and multiple independent variables such as bin_1,bin_2,…, bin_11).

**my attempt to run spline regression**:

Here is what I did so far to run spline regression on temperature time series data:

```
fit_sp <- smooth.spline(dat$bin_6, ln_gdp_percapita, nknots = 4)
pred <- stats:::predict.smooth.spline(fit_sp, dat$bin6)$ln_gdp_percapita
summary(pred)
```

but this is not what I want to do. So I also tried as follow:

```
sp2=lm(dat$ln_gva_agr_perworker ~ splines::ns(x = c(dat$bin_6),df = 2 ,knots =c(3)), data = dat)
df_=mutate(dat, smooth=fitted(sp2))
stats::predict(sp2)
ggplot(df_, aes(dat_$bin_6, df_$ln_gva_agr_perworker)) + geom_line(aes(df_$bin_6, df_$smooth))
```

but it didn’t render the plot that I want to get. My understanding about piecewise regression is to observe its breaking point and run regression it’s left or right observation that next to breaking point, which gives us a gradual linear trend. But now I didn’t get such result. Perhaps, I try to focus more on searching efficient workflow.

In general, I want to see how agriculture or industry sectors respond to daily temperature. So running a simple regression on temperature bin panel data is not sufficient to reach a conclusion. I believe restricted spline regression/smoothing spline or baseline regression might produce better estimation.

**desired regression output metrics**: (this one is inspired by related paper’s appendix.)

Here is the general picture of my desired regression output metric:

```
b_hi b b_low st error t_stat p-value
bin1 (-23.87:-18.44) -0.0129 -0.0306 -0.0483 0.0090257 -3.39 0.001
bin2 (-18.44:-13.02) -0.0050 -0.0096 -0.0141 0.0023336 -4.1 0
bin3 (-13.02:-7.59) -0.0040 -0.0057 -0.0075 0.0008904 -6.44 0
bin4 (-7.59:-2.17) 0.0030 0.0021 0.0011 0.000492 4.23 0
bin5 (-2.17:3.26) -0.0007 -0.0012 -0.0018 0.0002781 -4.48 0
bin6 (3.26:8.69) 0.0000 0.0000 0.0000
bin7 (8.69:14.11) 0.0008 0.0001 -0.0005 0.0003502 0.41 0.681
bin8 (14.11:19.54) 0.0010 0.0000 -0.0010 0.0005107 0.06 0.956
bin9 (19.54:24.96) 0.0028 0.0016 0.0005 0.0005737 2.85 0.004
bin10 (24.96:30.39) -0.0031 -0.0057 -0.0083 0.0013075 -4.36 0
```

**desired scatter plot**:

Here is the desired scatter plot that I want to achieve (this is just simulated scatter plot that inspired by related paper’ figure):

in this plot, black point line is estimated regression (either baseline or restricted spline regression) coefficient, and dot blue line is 95% confidence interval based on clustered standard errors.

How do I accomplish my desired output with minimal code and efficiently? Could someone point me in the right direction? Any more thoughts? Thanks in advance!