#StackBounty: #survival #interpolation #time-varying-covariate Time varying covariates and Interpolation issue

Bounty: 50

Based on my reading on time-varying survival analysis, I am encountering two different and conflicting sets of advice with regards to time-varying covariates and interpolation.

  1. The first advice is to avoid basing covariates on future events, which may introduce bias. As example, suppose a subject has two lab measurements 25 at time 0 and 50 at time 2; using counting process notation, the subject would be entered as two time intervals A and B: A. (0,2] 25 died = 0, and B. (2,5] 50 died = 1. Under one interpolation of the values the subject would have 37.5 for A. Based on the above advice, bias (perhaps small?) may be introduced as the value 37.5 is based on a future event.
  2. The second advice is to go ahead and interpolate, and there are some creative methods such as joint mixed models which seem to do this.

Which advice to take? If it depends, on what situations would it be appropriate to prefer one of the other?

Get this bounty!!!

#StackBounty: #python #matplotlib #scipy #interpolation Return z-value of contour from separate xy coordinate

Bounty: 100

I have a set of xy cooridnates that generate a contour. For the code below, these cooridnates are from groups A and B. I have also created a separate xy cooridnate that is called from C1_X and C1_Y. However this isn’t used creating the contour itself. It is a separate xy coordinate.

Question: Is it possible to return the z-value at the C1_X C1_Y cooridnate?

It is similar to another question: enter link description here. This figure displays what I’m hoping to return.enter image description here

The contour below is normalised so values fall between -1 and 1. I’m hoping to return the z-value for C1_X and C1_Y, which is the white scatter point seen in the figure beneath the code.

I have attempted to return the z-value for this point using:

# Attempt at returning the z-value for C1 
f = RectBivariateSpline(X, Y, normPDF)
z = f(d['C1_X'], d['C1_Y']) 

But I’m returning an error: raise TypeError('x must be strictly increasing')
TypeError: x must be strictly increasing

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as sts
import matplotlib.animation as animation
import matplotlib.transforms as transforms
from mpl_toolkits.axes_grid1 import make_axes_locatable
from scipy.interpolate import RectBivariateSpline

DATA_LIMITS = [0, 15]

def datalimits(*data):
    return DATA_LIMITS 

def mvpdf(x, y, xlim, ylim, radius=1, velocity=0, scale=0, theta=0):
    X,Y = np.meshgrid(np.linspace(*xlim), np.linspace(*ylim))
    XY = np.stack([X, Y], 2)
    PDF = sts.multivariate_normal([x, y]).pdf(XY)
    return X, Y, PDF

def mvpdfs(xs, ys, xlim, ylim, radius=None, velocity=None, scale=None, theta=None):
    PDFs = []
    for i,(x,y) in enumerate(zip(xs,ys)):
        X, Y, PDF = mvpdf(x, y, xlim, ylim)

    return X, Y, np.sum(PDFs, axis=0)

''' Animate Plot '''

fig, ax = plt.subplots(figsize = (10,6))

line_a, = ax.plot([], [], 'o', c='red', alpha = 0.5, markersize=5,zorder=3)
line_b, = ax.plot([], [], 'o', c='blue', alpha = 0.5, markersize=5,zorder=3)

offset = lambda p: transforms.ScaledTranslation(p/82.,0, plt.gcf().dpi_scale_trans)
trans = plt.gca().transData

scat = ax.scatter([], [], s=5**2,marker='o', c='white', alpha = 1,zorder=3,transform=trans+offset(+2) )

cfs = None

def plotmvs(tdf, xlim=None, ylim=None, fig=fig, ax=ax):    
    global cfs  
    if cfs:
        for tp in cfs.collections:

    df = tdf[1]

    if xlim is None: xlim = datalimits(df['X'])
    if ylim is None: ylim = datalimits(df['Y'])

    PDFs = []

    for (group, gdf), group_line in zip(df.groupby('group'), (line_a, line_b)):
        X, Y, PDF = mvpdfs(gdf['X'].values, gdf['Y'].values, xlim, ylim)

    for (group, gdf), group_line in zip(df.groupby('group'), lines+scats):
            if group in ['A','B']:
                kwargs = {
                'radius': gdf['Radius'].values if 'Radius' in gdf else None,
                'velocity': gdf['Velocity'].values if 'Velocity' in gdf else None,
                'scale': gdf['Scaling'].values if 'Scaling' in gdf else None,
                'theta': gdf['Rotation'].values if 'Rotation' in gdf else None,
                'xlim': xlim,
                'ylim': ylim
                X, Y, PDF = mvpdfs(gdf['X'].values, gdf['Y'].values, **kwargs)

            #plot white scatter point    
            elif group in ['C']:
                gdf['X'].values, gdf['Y'].values

    normPDF = (PDFs[0]-PDFs[1])/max(PDFs[0].max(),PDFs[1].max())

    # Attempt at returning the z-value for C1 
    f = RectBivariateSpline(X, Y, normPDF)
    z = f(d['C1_X'], d['C1_Y']) 

    cfs = ax.contourf(X, Y, normPDF, cmap='jet', alpha = 1, levels=np.linspace(-1,1,10),zorder=1)

    divider = make_axes_locatable(ax)
    cax = divider.append_axes("right", size="5%", pad=0.1)
    cbar = fig.colorbar(cfs, ax=ax, cax=cax)

    return  cfs.collections + [scat] + [line_a,line_b] 

n = 1
time = range(n)  

d = ({
    'A1_X' :    [3],
    'A1_Y' :    [6],
    'A2_X' :    [6],
    'A2_Y' :    [10],
    'B1_X' :    [12],
    'B1_Y' :    [2],
    'B2_X' :    [14],
    'B2_Y' :    [4],
    'C1_X' :    [4],
    'C1_Y' :    [6],
    'A1_Radius' :  [107],
    'A2_Radius' :  [95],  
    'B1_Radius' :  [250],
    'B2_Radius' :  [213],
    'A1_Scaling' : [7],
    'A2_Scaling' : [5],      
    'B1_Scaling' : [2],
    'B2_Scaling' : [4],                   
    'A1_Rotation' :    [0],
    'A2_Rotation' :    [0], 
    'B1_Rotation' :    [0],
    'B2_Rotation' :    [0],                       

tuples = [((t, k.split('_')[0][0], int(k.split('_')[0][1:]), k.split('_')[1]), v[i])
    for k,v in d.items() for i,t in enumerate(time) ]

df = pd.Series(dict(tuples)).unstack(-1)
df.index.names = ['time', 'group', 'id']

interval_ms = 1000
delay_ms = 2000
ani = animation.FuncAnimation(fig, plotmvs,  frames=df.groupby('time'), interval=interval_ms, repeat_delay=delay_ms,)


enter image description here

I am hoping to return the contour value for this point. Intended Output will be either a list or df that displays the normalised z value (-1,1) for C.

Upon visual inspection this would be approx 0.6 or 0.7

Get this bounty!!!

#StackBounty: #regression #nonparametric #splines #interpolation Estimating Spline curve by OLS. Is a good idea to fix the knots at Che…

Bounty: 100

I am writing my master’s degree thesis on a novel method for fixing knots in an adaptive way and while reading the literature I’ve found many references to the so-called Chebyshev sites. This sites or points are basically the roots of the Chebyshev polynomials, and there’s a proof showing that the Lebesgue constant (a measure that allows us to see how good is a polynomial interpolation) is very close to its lower bound if we put our knots in those sites. A practical guide to splines (De Boor, 1972) for example provides such a proof.

Anyway I am not interested in polynomial interpolation with splines, but in regression using LASSO on a set of B-splines . On Numerical methods in economics (Judd, 1998) it’s stated that the results regarding the ‘optimality’ of the Chebyshev sites hold also for the regression case, but there is no proof showing that the latter is true.

I would like to know if it’s a good idea to estimate the regression using B-splines constructed over a knot sequence given by the Chebyshev sites, since in the OLS framework I am not interested on minimising the Lebesgue constant but rather I want to minimise the $| cdot |_2^2$. In many articles I’ve found phrases like “we will set the knots at the Chebyshev sites, that are well known to be good…” making allusion to the results presented in De Boor, but ignoring that those refer to the interpolation case.

If your could put some light on the problem I will be very grateful.

Get this bounty!!!