#StackBounty: #machine-learning #python #splines Logistic regression using splines in python

Bounty: 50

I am trying to reproduce the results from chapter 5.2.2 of ESL which is about logistic regression using splines. The dataset is the african heart disease dataset (downloadable from the website following data -> South African Heart Disease data)

I take a shortcut compared to the book since I directly select the relevant data. I do not perform data selection based on AIC criterion.

Here is my code :

import pandas as pd
import patsy as patsy
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split

SAHeart = pd.read_csv(r"SAheart.data")
SAHeart.drop('row.names',axis = 1, inplace = True)
SAHeart = pd.get_dummies(SAHeart, columns = ['famhist'],drop_first=True) #convert categorial into binary
SAHeart = SAHeart.rename(columns={"famhist_Present": "famhist"})

SAHeart = SAHeart[['sbp','tobacco','ldl','famhist','obesity','age','chd']]

train, test = train_test_split(SAHeart, test_size=0.2)
trainResponse= train.chd

train.drop('chd',axis = 1, inplace = True)
test.drop('chd',axis = 1, inplace = True)

#train.drop('famhist',axis = 1, inplace = True)
#test.drop('famhist',axis = 1, inplace = True)

degFree = 4

data = pd.DataFrame()
for column in train:
    if column == 'famhist':
        y = patsy.dmatrix("bs(train[column], df=1, degree = 1)",
        {"train[column]": train[column]}, return_type='dataframe')
        y.columns = [train[column].name]*y.shape[1]
    if column != 'famhist':
        #start = SAHeart[column].min()
        #end = SAHeart[column].max()
        x = train[column]
        #! knots below = inner knots (=Total - boundary)
        #! for a natural cubic spline, df = number of TOTAL knots (inner + boundary)
        y = patsy.dmatrix("cr(train[column], df=4)",
        {"train[column]": train[column]}, return_type='dataframe')#patsy.cr(x, df=degFree)
        y.columns = [train[column].name]*y.shape[1]
    data = pd.concat([data,y.iloc[:,1:]],axis=1) #
    coeff = 0.1*np.ones((1,4))
        #plt.figure()
        #plt.plot(x,coeff*y

dataTest = pd.DataFrame()
for column in test:
    if column == 'famhist':
        y = patsy.dmatrix("bs(test[column], df=1, degree = 1)",
        {"train[column]": test[column]}, return_type='dataframe')
        y.columns = [test[column].name]*y.shape[1]

    if column != 'famhist':
        #start = SAHeart[column].min()
        #end = SAHeart[column].max()
        x = test[column]

        #! knots below = inner knots (=Total - boundary)
        #! for a natural cubic spline, df = number of TOTAL knots (inner + boundary)
        y = patsy.dmatrix("cr(test[column], df=4)",
                          {"test[column]": test[column]}, return_type='dataframe') #patsy.cr(x, df=degFree)
        y.columns = [test[column].name]*y.shape[1]
    dataTest = pd.concat([dataTest,y.iloc[:,1:]],axis=1)#natural cubic spline df = #knots, don't take intercept

#Add intercept term
data = pd.concat([pd.Series(np.ones((data.shape[0])),index=data.index),data],axis=1)
dataTest = pd.concat([pd.Series(np.ones((dataTest.shape[0])),index=dataTest.index),dataTest],axis=1)

logistic = linear_model.LogisticRegression(C=1.0)
logistic.fit(data, trainResponse)
prediction = logistic.predict(dataTest)

coeffs = logistic.coef_

#plot sbp related data
span = np.linspace(min(train['sbp']),max(train['sbp']),100).reshape(-1,1)
spanSpline = patsy.dmatrix("cr(span, df=4)",
                          {"span": span}, return_type='dataframe') #patsy.cr(span, df=degFree)
plt.figure()
plt.plot(span, np.dot(spanSpline, coeffs[0][0:degFree+1].T),marker=".")

But I don”t get the same figure for f_hat(sbp) as you can see from the picture below (up = my f_hat, below = ESL book f_hat). The picture is obtained by the last lines in the code (i.e. below the “plot sbp related data comment”)

MyAnswer
ESL_answer


Get this bounty!!!

One Reply to “#StackBounty: #machine-learning #python #splines Logistic regression using splines in python”

  1. Pingback: essayforme

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.