*Bounty: 50*

*Bounty: 50*

If I know that a multivariate dataset has a piecewise-linear data generating process with known knots (or breakpoints), then what is the appropriate kernel function to use in Kernel-PCA?

For example, given $n = 1, …, N$ observations and $j = 1, …, J$ variables, assume the true data generating process:

begin{equation}

X_{n,j} = alpha_{1,j} F_{n} I_{F_n leq 0} + alpha_{2,j} F_{n} I_{F_n > 0} + e_{n,j}

end{equation}

where $I$ is an indicator function. That is, the true data generating process is piecewise linear in a single factor $F_n$, with a knot at $0$ and slopes $alpha_{1,j}, alpha_{2,j}$. Assuming I want to use Kernel-PCA, is there a known most efficient kernel function to use?

My guess is that the answer is probably related to the standard hinge function for piecewise-linear regression, i.e. $f(x) = max(x, 0)$, and so the appropriate kernel function for Kernel-PCA might be some combination of the inner products of the centred variables and their hinge functions. But how would I derive this?

More generally, is there a standard methodology for converting a known true data generating process into the appropriate corresponding kernel function?

Any pointers to good reading on this material would also be greatly appreciated.