#StackBounty: #machine-learning #matrix-decomposition #tensor CP decomposition for tensor factorization

Bounty: 50

I am trying to understand CP decomposition for a three way tensor. Lets have a tensor which has the dimensions I by J by K. When we apply CP decomposition, it decomposes the tensor as a sum of a rank-1 tensors.

sum_r=1^R a_r o b_r o c_r

The question is each dimension is different, how do they sum up to R? Also, how to find R. Is it randomly chosen?

Thanks in advance.


Get this bounty!!!

#StackBounty: #machine-learning #confidence-interval #p-value #conditional-probability Finding a confidence factor for a calculation?

Bounty: 50

I have a population in which some have an event A and some other don’t. Event A is actually my target class. I also have a set of variables/features for my population which I can use in a modeling (supervised learning) setting. Let’s say one of the features/variables is age. What I’d like to find is the impact of age on event A in a very intuitive way. Assume my population size is 2000 and 100 of them have event A and the rest don’t. I somehow came up with a cutting point for the age, e.g. less that 40 years old and greater than 40 years old. Here is the distribution of the population:

                  Have event A       don't have event A
less that 40              20                   100
greater than 40           80                   1800

To show the impact of age on event, I do the following : p(have event A| age less than 40) / p(have event A/ age greater than 40)
= (20/120) / (80/1880)

However, I’d like to find something like a p-value for this calculation. Howe can I do that?


Get this bounty!!!

#StackBounty: #r #machine-learning #classification #caret comparing caret models with mean or median?

Bounty: 50

I am using caret to evaluate the classification performance of several models on a small dataset (190 obs) with two classes and just a handful of features.

When I inspect the train() object for one of the models, I get what looks to be the mean metric values (ROC, Sens, and Spec).

Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 171, 171, 171, 171, 171, 171, ... 
Resampling results across tuning parameters:

  nIter  method         ROC        Sens       Spec
   50    Adaboost.M1    0.8866667  0.9866667  0.58
   50    Real adaboost  0.5566667  0.9844444  0.50
  100    Adaboost.M1    0.8844444  0.9877778  0.58
  100    Real adaboost  0.5738889  0.9833333  0.52
  150    Adaboost.M1    0.8800000  0.9877778  0.60
  150    Real adaboost  0.5994444  0.9833333  0.52

When I use the resamples() function and put all of the models in a list, I get the means again, but also the median values. (other model results omitted for clarity)

Models: RF, GBM, SVM, ADABOOST, C5, NB 
Number of resamples: 50 

ROC 
            Min. 1st Qu. Median   Mean 3rd Qu. Max. NA's
ADABOOST 0.25000  0.8958 0.9444 0.8867       1    1    0

Sens 
           Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA's
ADABOOST 0.8889  1.0000 1.0000 0.9867  1.0000 1.0000    0

Spec 
         Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
ADABOOST    0       0      1 0.58       1    1    0

The bwplot() function appears to display the median values as the point estimates.

enter image description here

It seems to me like the train() output wants me to evaluate the models based on the means. bwplot() focuses on the median. My first thought was that the median would be a better metric with such spread.

Which would you use, and why?


Get this bounty!!!

#StackBounty: #machine-learning #classification Can The linearly non-separable data be learned using polynomial logistic regression?

Bounty: 50

I know that Polynomial Logistic Regression can easily learn a typical data like the following image:
first image

I was wondering whether the following two data also can be learned using Polynomial Logistic Regression or not.


enter image description here



enter image description here


Get this bounty!!!

#StackBounty: #machine-learning #neural-networks #feature-selection #prediction #feature-construction Feeding clusters to neural network

Bounty: 100

I have labeled GPS location data (lat,lon) for determining whether a trip is of a certain type. The location data consists of start and end points, in the format of lat,lon coordinates.
A trip is labeled by bicycle or car, and this is what I’m trying to predict based on a persons previous location habits.

For example the coordinates (lat1,lon1) to (lat2,lon2) has previously always been traveled by car, whereas (lat1,lon1) to (lat3,lon3) is usually traveled by bicycle. Hour of day data is also available which I think also could be used to predict. The city in which coordinations are collected in is small, so distance would not be a good indicator of the trip type.

I have tried to feed the start location, end location and hour of day with the label into an NN, but without results. I guess this is because lon,lat coordinates are only useful as a pair and not as independent parameters?
Now, I’m thinking of using the DB-SCAN algorithm to cluster nearby points together, which will also make the system more immune against inaccuracies of the GPS.

But I’m unsure about the next step. Could the clusters be feeded to a neural network?

(A4,B7) -> Bicycle

(A4,B8) -> Car

(B7,A4) -> Car

Will the network be able to detect patterns in the data? Or is a neural network a bad idea and should I instead go for an alternative approach?


Get this bounty!!!

#StackBounty: #machine-learning #self-study #neural-networks #optimization Stochastic gradient descent for neural networks with tied we…

Bounty: 50

For the neural network depicted below, I want to calculate the Error with respect to $w_{tied}$, which we get if we tie the weights $w_1$ and $w_4$ together. Tying the weights together would help to reduce overfitting, since we reduce the number of parameters. The hidden units $h_1,h_2$ are logistic, the output neuron f is a linear unit, and we are using the squared error cost function $E = (y − f)^2$.

I know that the solutions is (1) $frac{partial E}{partial w_{tied}} = frac{partial E}{partial f}(frac{partial f}{partial h_1}frac{partial h_1}{partial w_{tied}}+frac{partial f}{partial h_2}frac{partial h_2}{partial w_{tied}})$

From that it follows that

$frac{partial E}{partial w_{tied}} = -2(y-f)(u_1h_1(1-h_1)(-x_1)+ u_2h_2(1-h_2)(-x_2))$

Question: How do I write the stochastic gradient descent algorithm for $w_{tied}$ now? In stochastic gradient descent, we randomly pick one datasample to optimize – Here, do we randomly pick a weight we want to optimize? So, we aren’t we optimizing for the entire weight vector $vec{w}$? But for random i we optimize $w_i$, by feeding all possible datasamples x into the gradient?

Also, I realize that in our solutions, it is “$x_1$” or “$x_2$” in the gradient above, and not “$-x_1$” or “$-x_2$” – why is that?
Thanks

enter image description here


Get this bounty!!!

#StackBounty: #machine-learning #neural-networks #normal-distribution #optimization Step-by-step construction of an RBF neural network

Bounty: 50

I would like to solve the following task as an exercise: Given is the data in the image below. Each output $Y_j$ of the network is defined as:

$()$ $Y_j = sum_{i}^N w_{ij} exp(-frac{||x-mu_i||^2}{2sigma_i^2})$

for the -ith neuron. The task is to draw an RBF network that perfectly classifies the data, with suitable means, covariances and weights. In a second step a point has to be taken and classified in the worked-out model.

My ideas We have two nodes in the input layer, one for each dimension. The hidden-layer has as many neurons as there are training samples. Each of these calculates the activation given by the exponential above. The output layer has three output nodes as there are three classes. I would determine $Sigma$ and $mu$ as follows: For each class and training-sample $x_i$ of this class, $mu_j$ is just the centroid of the training samples, $sigma = frac{1}{m}sum_i ||x_i-mu||$, and $Sigma = sigma*I_d$.

Questions: That does not seem to be correct however – if I have the same $Sigma$ and $mu$ for all the hidden nodes of the same class, each new test-input would result in the same activation for all these nodes..so what exactly are the particular $u_i$ and $sigma_i$? Also, for perfect classification, I would set all the weights which belong to the correct class to 1, and all the others to zero – would that make sense?

enter image description here


Get this bounty!!!

#StackBounty: #machine-learning #statistical-significance Statistical test for random machine learning classifier algorithms

Bounty: 50

Many optimization algorithms, such as stochastic gradient descent algorithm, are based on random processes.
Assume that I am using one classifier that uses the SGD,
What statistical tests are required to validate the results, e.g., the accuracy?

Does running the algorithm $n$ times and the find the average of all $n$ accuracies enough?

Is there any other standard method that gives a statistical significance for the classification in such scenario?


Get this bounty!!!

#StackBounty: #machine-learning #statistical-significance Statistical test for machine learning classifier algorithms

Bounty: 50

Many optimization algorithms, such as stochastic gradient descent algorithm, are based on random processes.
Assume that I am using one classifier that uses the SGD,
What statistical tests are required to validate the results, e.g., the accuracy?

Does running the algorithm $n$ times and the find the average of all $n$ accuracies enough?

Is there any other standard method that gives a statistical significance for the classification in such scenario?


Get this bounty!!!

#StackBounty: #machine-learning #classification #clustering What algorithms are available to cluster sequences of data?

Bounty: 50

I have a data set containing points through time, generated by multiple Markov processes (each point in time contains N points). I know the statistical nature of the Markov processes (same for all), but my task is to determine which points go together (from the same process). Are there developed algorithms that address this type of problem? I should say my more general problem has missing data and an unknown number of processes, but I’d be interested in approaches to the “easy” version too, where there are no missing points and N is known.


Get this bounty!!!