*Bounty: 50*

I am trying to reproduce the results from chapter 5.2.2 of ESL which is about logistic regression using splines. The dataset is the african heart disease dataset (downloadable from the website following data -> South African Heart Disease data)

I take a shortcut compared to the book since I directly select the relevant data. I do not perform data selection based on AIC criterion.

Here is my code :

```
import pandas as pd
import patsy as patsy
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split
SAHeart = pd.read_csv(r"SAheart.data")
SAHeart.drop('row.names',axis = 1, inplace = True)
SAHeart = pd.get_dummies(SAHeart, columns = ['famhist'],drop_first=True) #convert categorial into binary
SAHeart = SAHeart.rename(columns={"famhist_Present": "famhist"})
SAHeart = SAHeart[['sbp','tobacco','ldl','famhist','obesity','age','chd']]
train, test = train_test_split(SAHeart, test_size=0.2)
trainResponse= train.chd
train.drop('chd',axis = 1, inplace = True)
test.drop('chd',axis = 1, inplace = True)
#train.drop('famhist',axis = 1, inplace = True)
#test.drop('famhist',axis = 1, inplace = True)
degFree = 4
data = pd.DataFrame()
for column in train:
if column == 'famhist':
y = patsy.dmatrix("bs(train[column], df=1, degree = 1)",
{"train[column]": train[column]}, return_type='dataframe')
y.columns = [train[column].name]*y.shape[1]
if column != 'famhist':
#start = SAHeart[column].min()
#end = SAHeart[column].max()
x = train[column]
#! knots below = inner knots (=Total - boundary)
#! for a natural cubic spline, df = number of TOTAL knots (inner + boundary)
y = patsy.dmatrix("cr(train[column], df=4)",
{"train[column]": train[column]}, return_type='dataframe')#patsy.cr(x, df=degFree)
y.columns = [train[column].name]*y.shape[1]
data = pd.concat([data,y.iloc[:,1:]],axis=1) #
coeff = 0.1*np.ones((1,4))
#plt.figure()
#plt.plot(x,coeff*y
dataTest = pd.DataFrame()
for column in test:
if column == 'famhist':
y = patsy.dmatrix("bs(test[column], df=1, degree = 1)",
{"train[column]": test[column]}, return_type='dataframe')
y.columns = [test[column].name]*y.shape[1]
if column != 'famhist':
#start = SAHeart[column].min()
#end = SAHeart[column].max()
x = test[column]
#! knots below = inner knots (=Total - boundary)
#! for a natural cubic spline, df = number of TOTAL knots (inner + boundary)
y = patsy.dmatrix("cr(test[column], df=4)",
{"test[column]": test[column]}, return_type='dataframe') #patsy.cr(x, df=degFree)
y.columns = [test[column].name]*y.shape[1]
dataTest = pd.concat([dataTest,y.iloc[:,1:]],axis=1)#natural cubic spline df = #knots, don't take intercept
#Add intercept term
data = pd.concat([pd.Series(np.ones((data.shape[0])),index=data.index),data],axis=1)
dataTest = pd.concat([pd.Series(np.ones((dataTest.shape[0])),index=dataTest.index),dataTest],axis=1)
logistic = linear_model.LogisticRegression(C=1.0)
logistic.fit(data, trainResponse)
prediction = logistic.predict(dataTest)
coeffs = logistic.coef_
#plot sbp related data
span = np.linspace(min(train['sbp']),max(train['sbp']),100).reshape(-1,1)
spanSpline = patsy.dmatrix("cr(span, df=4)",
{"span": span}, return_type='dataframe') #patsy.cr(span, df=degFree)
plt.figure()
plt.plot(span, np.dot(spanSpline, coeffs[0][0:degFree+1].T),marker=".")
```

But I don”t get the same figure for f_hat(sbp) as you can see from the picture below (up = my f_hat, below = ESL book f_hat). The picture is obtained by the last lines in the code (i.e. below the “plot sbp related data comment”)

Get this bounty!!!