*Bounty: 50*

*Bounty: 50*

I am trying to implement a quantile regression forest (https://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf).

But, I have some difficulties to understand how the quantiles are computed. I will try to summarize the part of interest in order to then explain exactly what I don’t understand.

Let be $n$ independent observations $(X_i, Y_i)$. A tree $T$ parametrized with a realization $theta$ of a random variable $Theta$ is denoted by $T(theta)$.

- Grow $k$ trees $T(theta_t)$, $t = 1, . . . , k$, as in random forests. However, for every leaf of every tree, take note of all observations in this leaf, not just their average.
- For a given $X = x$, drop $x$ down all trees. Compute the weight $omega_i(x, theta_t)$ of observation $i in {1, . . . , n}$ for every tree as in (4). Compute weight $omega_i(x)$ for every observation $i in {1, . . . , n}$ as an average over $omega_i(x, theta_t)$, $t = 1, . . . , k$, as in (5).
- Compute the estimate of the distribution function as in (6) for all $y in mathbb{R}$.

Where the equations (4), (5), (6) are given below.

$$ omega_i(x, theta_t) = frac{ 1 { X_i in R(x, theta_t) } }{text{#} { j : X_j in R(x, theta_t) } } (4)$$

$$ omega_i(x) = k^{-1} sum_{t=1}^k omega_i(x, theta_t) (5)$$

$$ hat{F}(y|X=x) = sum_{i=1}^n omega_i (x) 1{Y_i leq y} (6) $$

Where $R(x, theta_t)$ denotes the rectangular area corresponding to the unique leaf of the tree $T(theta_t)$ that $x$ belongs to.

I can compute (4) and (5) but I don’t understand how to compute (6) and then estimate quantiles. I would also add that I don’t know where all observations in leaves (first step of the algorithm) are used.

Can someone give some elements to understand this algorithm ? Any help would be appreciated.