#StackBounty: #python #regression #convnet #keras #audio-recognition Optimizing CNN network

Bounty: 50

I am currently trying to recreate the result of this paper, in which they do feature extraction from a “spectogram” of log-melfilter energies..

enter image description here

Since the paper doesn’t state what kind of feature I am seeking, i am currently trying to extract features, and match them to MFCC features. The paper states a technique called LWS (Limited weight sharing) in which the spectogram frequency axis will be divided into section, and each section don’t share their weight with others.

So i’ve divided the my input image into 13 section to receive 1 output features from a (6,3,3) input image. 6 for the number of rows, 3 as each column represent [static delta delta_delta] data of the given log melfilter energi, and the last 3 is the color channels.

If i’d used 13 filterbanks, and made the plot, will the result of this be that each (1,3,3) matrix would result in one feature, but that seemed a bit too good to be true, so i decided to use 78 filterbanks and divide it into 13 section which should result in one feature can be extracted from a matrix of size (6,3,3)

I am training the network with this model structure:

def create_model(init_mode='normal',activation_mode='softsign',optimizer_mode="Adamax", activation_mode_conv = 'softsign'):
    model = Sequential()


    model.add(ZeroPadding2D((6,4),input_shape=(6,3,3)))
    model.add(Convolution2D(32,3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(Convolution2D(32, 3,3, activation=activation_mode_conv))
    print model.output_shape
    model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1)))
    print model.output_shape
    model.add(Convolution2D(64, 3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(Convolution2D(64, 3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1)))
    model.add(Flatten())
    print model.output_shape
    model.add(Dense(output_dim=32, input_dim=64, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=13, input_dim=50, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=1, input_dim=13, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=1,  init=init_mode, activation=activation_mode))
    #print model.summary()
    model.compile(loss='mean_squared_error',optimizer=optimizer_mode)

    return model

This model keeps for some reason providing me very bad results..
I seem to keep getting an loss of 216, which is nearly 3 times the data range…

I did a grid seach to find out which parameter (activation function, init_mode, epochs and batch_size would be best, which are those chosen in the function above (eventhough there wasn’t much change in the outcome..)

What can i do to get better results?
Is the CNN network poorly designed?


Get this bounty!!!

#StackBounty: #regression #convnet #keras #audio-recognition Optimizing CNN network

Bounty: 50

I am currently trying to recreate the result of this paper, in which they do feature extraction from a “spectogram” of log-melfilter energies..

enter image description here

Since the paper doesn’t state what kind of feature I am seeking, i am currently trying to extract features, and match them to MFCC features. The paper states a technique called LWS (Limited weight sharing) in which the spectogram frequency axis will be divided into section, and each section don’t share their weight with others.

So i’ve divided the my input image into 13 section to receive 1 output features from a (6,3,3) input image. 6 for the number of rows, 3 as each column represent [static delta delta_delta] data of the given log melfilter energi, and the last 3 is the color channels.

If i’d used 13 filterbanks, and made the plot, will the result of this be that each (1,3,3) matrix would result in one feature, but that seemed a bit too good to be true, so i decided to use 78 filterbanks and divide it into 13 section which should result in one feature can be extracted from a matrix of size (6,3,3)

I am training the network with this model structure:

def create_model(init_mode='normal',activation_mode='softsign',optimizer_mode="Adamax", activation_mode_conv = 'softsign'):
    model = Sequential()


    model.add(ZeroPadding2D((6,4),input_shape=(6,3,3)))
    model.add(Convolution2D(32,3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(Convolution2D(32, 3,3, activation=activation_mode_conv))
    print model.output_shape
    model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1)))
    print model.output_shape
    model.add(Convolution2D(64, 3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(Convolution2D(64, 3,3 , activation=activation_mode_conv))
    print model.output_shape
    model.add(MaxPooling2D(pool_size=(2,2),strides=(2,1)))
    model.add(Flatten())
    print model.output_shape
    model.add(Dense(output_dim=32, input_dim=64, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=13, input_dim=50, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=1, input_dim=13, init=init_mode,activation=activation_mode))
    model.add(Dense(output_dim=1,  init=init_mode, activation=activation_mode))
    #print model.summary()
    model.compile(loss='mean_squared_error',optimizer=optimizer_mode)

    return model

This model keeps for some reason providing me very bad results..
I seem to keep getting an loss of 216, which is nearly 3 times the data range…

I did a grid seach to find out which parameter (activation function, init_mode, epochs and batch_size would be best, which are those chosen in the function above (eventhough there wasn’t much change in the outcome..)

What can i do to get better results?
Is the CNN network poorly designed?


Get this bounty!!!

#StackBounty: #regression #clustering #chi-squared #fitting #hierarchical-clustering Clustering categorical features based on fit

Bounty: 50

I have a set of data. For our purpose lets simplify it to one independent numerical variable,x, and dependent numerical variable, y. The goal is to train on the data to determine the parameters in the model, for simplicity assume y=mx+b. I could then predict new y values when new x values are given. Pretty standard.

The tricky part is that I have another feature dimension in my data set. If I could hypothesize two clusters which would each fit to their own line I might get a better prediction.

To restate, y1=m1x1+b1 and y2=m2x2+b2 where the data is split into two groups should fit the data better than y=m*x+b when all the data is fit together. This would then improve my ability to predict in the future. The problem is that even if I knew the groups I would not know what metric to use for “better”.

It would seem that R^2 would always decrease because I am adding parameters so this would lead to overfitting. Should I use chi^2/ndf? I feel like this is something that must be well understood I am just missing something on how to balance the number of clusters/models I should split my training data into.


Get this bounty!!!

#StackBounty: #regression #self-study #multiple-regression #econometrics #instrumental-variables Is Just-Identified 2SLS Median-Unbiased?

Bounty: 50

In Mostly Harmless Econometrics: An Empiricist’s Companion (Angrist and Pischke, 2009: page 209) I read the following:

(…) In fact, just-identified 2SLS (say, the simple Wald estimator) is approximately unbiased. This is hard to show formally because just-identified 2SLS has no moments (i.e., the sampling distribution has fat tails). Nevertheless, even with weak instruments, just-identified 2SLS is approximately centered where it should be. We therefore say that just-identified 2SLS is median-unbiased. (…)

Though the authors say that just-identified 2SLS is median-unbiased, they neither prove it nor provide a reference to a proof. At page 213 they mention the proposition again, but with no reference to a proof. Also, I can find no motivation for the proposition in their lecture notes on instrumental variables from MIT, page 22.

The reason may be that the proposition is false since they reject it in a note on their blog. However, just-identified 2SLS is approximately median-unbiased, they write. They motivate this using a small Monte-Carlo experiment, but provide no analytical proof or closed-form expression of the error term associated with the approximation. Anyhow, this was the authors’ reply to professor Gary Solon of Michigan State University who made the comment that just-identified 2SLS is not median-unbiased.

Question 1: How do you prove that just-identified 2SLS is not median-unbiased as Gary Solon argues?

Question 2: How do you prove that just-identified 2SLS is approximately median-unbiased as Angrist and Pischke argues?

For Question 1 I am looking for a counterexample. For Question 2 I am (primarily) looking for a proof or a reference to a proof.

I am also looking for a formal definition of median-unbiased in this context. I understand the concept as follows: An estimator $hat{theta}(X_{1:n})$ of $theta$ based on some set $X_{1:n}$ of $n$ random variables is median-unbiased for $theta$ if and only if the distribution of $hat{theta}(X_{1:n})$ has median $theta$.


Notes

  1. In a just-identified model the number of endogenous regressors is equal to the number of instruments.

  2. The framework describing a just-identified instrumental variables model may be expressed as follows: The causal model of interest and the first-stage equation is $$begin{cases}
    Y&=Xbeta+Wgamma+u \
    X&=Zdelta+Wzeta+v
    end{cases}tag{1}$$ where $X$ is a $ktimes n+1$ matrix describing $k$ endogenous regressors, and where the instrumental variables is described by a $ktimes n+1$ matrix $Z$. Here $W$ just describes some number of control variables (e.g., added to improve precision); and $u$ and $v$ are error terms.

  3. We estimate $beta$ in $(1)$ using 2SLS: Firstly, regress $X$ on $Z$ controlling for $W$ and acquire the predicted values $hat{X}$; this is called the first stage. Secondly, regress $Y$ on $hat{X}$ controlling for $W$; this is called the second stage. The estimated coefficient on $hat{X}$ in the second stage is our 2SLS estimate of $beta$.

  4. In the simplest case we have the model $$y_i=alpha+beta x_i+u_i$$ and instrument the endogenous regressor $x_i$ with $z_i$. In this case, the 2SLS estimate of $beta$ is $$hat{beta}^{text{2SLS}}=frac{s_{ZY}}{s_{ZX}}tag{2},$$ where $s_{AB}$ denotes the sample covariance between $A$ and $B$. We may simplify $(2)$: $$hat{beta}^{text{2SLS}}=frac{sum_i(y_i-bar{y})z_i}{sum_i(x_i-bar{x})z_i}=beta+frac{sum_i(u_i-bar{u})z_i}{sum_i(x_i-bar{x})z_i}tag{3}$$ where $bar{y}=sum_iy_i/n$, $bar{x}=sum_i x_i/n$ and $bar{u}=sum_i u_i/n$, where $n$ is the number of observations.

  5. I made a literature search using the words “just-identified” and “median-unbiased” to find references answering Question 1 and 2 (see above). I found none. All articles I found (see below) make a reference to Angrist and Pischke (2009: page 209, 213) when stating that just-identified 2SLS is median-unbiased.

    • Jakiela, P., Miguel, E., & Te Velde, V. L. (2015). You’ve earned it: estimating the impact of human capital on social preferences. Experimental Economics, 18(3), 385-407.
    • An, W. (2015). Instrumental variables estimates of peer effects in social networks. Social Science Research, 50, 382-394.
    • Vermeulen, W., & Van Ommeren, J. (2009). Does land use planning shape regional economies? A simultaneous analysis of housing supply, internal migration and local employment growth in the Netherlands. Journal of Housing Economics, 18(4), 294-310.
    • Aidt, T. S., & Leon, G. (2016). The democratic window of opportunity: Evidence from riots in Sub-Saharan Africa. Journal of Conflict Resolution, 60(4), 694-717.


Get this bounty!!!

#StackBounty: #regression #weighted-regression #weighted-sampling #weighted-data #genetic-algorithms Can someone point me towards resea…

Bounty: 100

I am working on Fitness case importance for Symbolic Regression and found a Paper “Step-wise Adaptation of Weights for Symbolic Regression with Genetic Programming” which talks about weights of fitness cases to give importance to points which are not evolved to boost performance and also get GP out of local optima.

This publication is too old and i am looking for new work which talk about fitness cases importance. But i am not able to find any such publication. Instead i find Publications related to sampling on Random selection in different ways.

So, Can someone point me towards research works relevant to Importance or Weighting Datapoints like SAW(Stepwise adaptation of weights) technique?

Thank you.


Get this bounty!!!

#StackBounty: #regression #self-study #multiple-regression #references #instrumental-variables Is Just-Identified 2SLS Median-Unbiased?

Bounty: 50

In Mostly harmless econometrics: An empiricist’s companion (Angrist and Pischke, 2009: page 209) I read the following:

(…) In fact, just-identified 2SLS (say, the simple Wald estimator) is approximately unbiased. This is hard to show formally because just-identified 2SLS has no moments (i.e., the sampling distribution has fat tails). Nevertheless, even with weak instruments, just-identified 2SLS is approximately centered where it should be. We therefore say that just-identified 2SLS is median-unbiased. (…)

One notices that the authors do not prove the proposition that just-identified 2SLS is median-unbiased, and furthermore, they give no reference to a proof. At page 213 they mention the proposition again, but with no reference to a proof. Also, I can find no motivation for the proposition in their lecture notes on instrumental variables from MIT, page 22.

The reason may be that the proposition is false since they reject the proposition in note on their blog. However, just-identified 2SLS is approximately median-unbiased according to them. They motivate this using a small Monte-Carlo experiment, but provide no analytical proof or any closed-form expression of the error term associated with the approximation. Anyhow, this was the authors’ reply to professor Gary Solon of Michigan State University who made the comment that just-identified 2SLS is not median-unbiased.

Question 1: How do you prove that just-identified 2SLS is not median-unbiased as Gary Solon argues?

Question 2: How do you prove that just-identified 2SLS is approximately median-unbiased as Angrist and Pischke argues?

For the first question I am looking for a counterexample to the proposition that just-identified 2SLS is median-unbiased; for the second I am looking for a proof or a reference to a proof.

I am also looking for a formal definition of “median-unbiased” in this context. My guess is that $hat{theta}(X_{1:n})$, where $X_{1:n}$ is a set of $n$ random variables, is median-unbiased for $theta$ if and only if the distribution of $hat{theta}(X_{1:n})$ has median $theta$.


Notes

  1. In a just-identified model the number of endogenous regressors is equal to the number of instruments.

  2. The framework describing a just-identified instrumental variables model may be expressed as follows: The causal model of interest and the first-stage equation is $$begin{cases}
    Y&=Xbeta+Wgamma+u \
    X&=Zdelta+Wzeta+v
    end{cases}$$ where $X$ is a $ktimes n+1$ matrix describing $k$ endogenous regressors, and where the instrumental variables is described by a $ktimes n+1$ matrix $Z$. Here $W$ just describes some number of control variables; and $u$ and $v$ are our error terms.

  3. We estimate $beta$ using 2SLS: Firstly, regress $X$ on $Z$ controlling for $W$ and acquire the predicted values $hat{X}$; this is called the first stage. Secondly, regress $Y$ on $hat{X}$ controlling for $W$; this is called the second stage. The estimated coefficient on $hat{X}$ in the second stage is our 2SLS estimate of $beta$.

  4. In the simplest case we have the model $y_i=alpha+beta x_i+u_i$ and instrument the endogenous regressor $x_i$ with $z_i$.

  5. I made a literature search using the words “just-identified” and “median-unbiased” to find a reference to the above questions. I found none. All articles I found (see below) make a reference to Angrist and Pischke (2009: page 209, 213) when stating that just-identified 2SLS (or IV) is median-unbiased.

    • Jakiela, P., Miguel, E., & Te Velde, V. L. (2015). You’ve earned it: estimating the impact of human capital on social preferences. Experimental Economics, 18(3), 385-407.
    • An, W. (2015). Instrumental variables estimates of peer effects in social networks. Social Science Research, 50, 382-394.
    • Vermeulen, W., & Van Ommeren, J. (2009). Does land use planning shape regional economies? A simultaneous analysis of housing supply, internal migration and local employment growth in the Netherlands. Journal of Housing Economics, 18(4), 294-310.
    • Aidt, T. S., & Leon, G. (2016). The democratic window of opportunity: Evidence from riots in Sub-Saharan Africa. Journal of Conflict Resolution, 60(4), 694-717.


Get this bounty!!!

#StackBounty: #r #regression Segmented regression with quadratic polynomial and a strightline

Bounty: 50

I am trying to implement segmented regression as per this example Segmented Regression, Breakpoint analysis.

Now, how can i implement it in such a way the second part will be quadratic polynomial and remaining other things same.

I tried the same by changing Z= ~poly(DistanceMeters, 2) however it didn’t work.

Also, How can I get equations like

part 1: a1*x+b1
part 2: a2*x2**2 + b2*x + c1
part 3 :a3*x + b3

There are similar questions like this however they din’t explain using segmented function.


Get this bounty!!!

#StackBounty: #regression #poisson #gbm #xgboost Poisson deviance (xgboost vs gbm vs regression)

Bounty: 100

I would like to know which is the deviance expression in poisson regression using by xgboost tool (extreme gradient boosting).

According to source code, the evaluation function is:

struct EvalPoissonNegLogLik : public EvalEWiseBase {

const char *Name() const override {

return "poisson-nloglik";

}

inline bst_float EvalRow(bst_float y, bst_float py) const {

const bst_float eps = 1e-16f;

if (py < eps) py = eps;

return common::LogGamma(y + 1.0f) + py - std::log(py) * y;

}

}

So deviance (in R) should be something like:

poisson_deviance <- function(y, py, eps) {

mean(LogGamma(y + 1.0f) + pmax(py, eps) - log(pmax(py, eps)) * y);

}

I have two questions here:

1) How translate LogGamma to R?. I found several links googling ‘loggamma’ and seems each language understand differents expressions for this term.

2) What to do with exposures? I know we need to set to xgbMatrix using:

setinfo(xgbMatrix, "base_margin", log(exposure))

But in the code of EvalPoissonNegLogLik I never saw the offset again, so what I deducted is that the only we need is to add the log(exposure) to predictiors:

poisson_deviance <- function(y, py, exposure, eps) {

mean(LogGamma(y + 1.0f) + pmax(py + log(exposure), eps) - log(pmax(py + 

log(exposure), eps)) * y);

}

The deviance formula used by gradient boosting gbm R package for poisson regression is:

poisson_deviance <- function(y, py) {mean(y*py - exp(py))}

(capped py at eps too)

As you can see in the last page of this document:

Are gbm and xgboost using the same error for poisson regression?
This expression of deviance seems different that what is used in xgboost.

At last, the deviance formula in poisson regression according to B.5.3 in
here should be:

2 * mean(y * log(y / py) - (y - py))

that is other different formula.

I would appreciate any help to understand why both gbm and xgboost use other deviance formulation.


Get this bounty!!!

#StackBounty: How are Regression and Classification different when modeling time-series data?

Bounty: 100

I am attempting to predict a binomial outcome based on a set of continuous and binomial predictors in a time-series. Eg.

Y/N buy car ~ cost +Y/N get loan + future gas price predictions

If I was trying to take the output of a classifier and a regression equation based on the same training dataset and come up with whether or not to buy a car for a specific case, is there any reason a classification algorithm would be better or worse than regression?

This seems to provide a general rule of thumb, but I would really like a highly specific answer eg. “Classification algorithms do this better because…”

  1. Does either approach work better for time-series data?
  2. Does either approach handle erratic data better?


Get this bounty!!!

#StackBounty: Driver analyses for ordinal dependent and binary independent variables

Bounty: 50

I would like to find out which variables (from a set of 30 binary variables) have the most impact on an ordinal satisfaction measurement (it can reach from 1 – not happy at all to 4 – absolutely happy).
Unfortunately most of the binary independent variables are (highly) correlated.

There are about 20 different shops to sell the product and I also want to check if different customer-types have different drivers.

My dataset looks like this (with D1 to D30 being the dichotomous independent Variables):

Data structur

I wanted to use a hierarchical regression, but I think it will not be appropriate for the ordinal dependent variable. Another problem might be the high correlation between the binary independent variables.

So now I read about random forest classification, but I am not sure if this is the right way to go?

Do you have any suggestions about a proper method for my problem?
And more generally, are there any methods to deal with high correlation in binary predictors?


Get this bounty!!!