#StackBounty: #r #regression #feature-selection How to code coverage probability for SIS screening in R

Bounty: 50

I have a high dimensional multivariate regression model $(p>>n: p=10,000, n=200)$

$$ Y = X^Tbeta + epsilon$$

which may be sparse and therefore i have attempted some screening methods (SIS, ISIS, Lars…) and I want to compute some statistics below which are defined in a paper i read here. The “coverage probability” is the proportion of non-zero parameters of T that are also in S and is defined by:

CP = $frac{1}{n}sum_{k} 1_{{T subset hat{S}_{(k)}}}$

Say $T = (1,2,3,…q)$ is the true model and of size $q=8$ for example. Say also that $hat{S}_{(k)} ={ j:|hat{beta}|>0 } $ is the simulated model. Any advice on how can I code this?

My code so far is:

#Model T which is of length 8 (i.e. only 8 values of beta are non-zero)
Y<-X%*%beta + rnorm(n)

#SIS model of length 38
library(SIS)
sismodel=SIS(X, Y, family='gaussian')
#Coefficients 
beta_hat<-sismodel$coef.est 
#Index of coef
path<-sismodel$ix

so I believe i need to find the index values (path) of beta which are the same as $hat{beta}$…is this right? I’m thinking along the lines of using this function somehow but come a bit unstuck, any advice on how to proceed would be helpful!

beta_hat = rep(0,d)
beta_hat[setdiff(path,beta)] =1 


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.