#StackBounty: #machine-learning #python #neural-network #deep-learning #word2vec Image Embeddings – Negative Sampling and Imbalanced Cl…

Bounty: 50

I am using the negative sampling approach used in Word2Vec to train some image embeddings. From what I have read, for every positive example, we are creating a number of negative examples.

Question: Why do we use an imbalanced dataset here? Presumably we will get the normal issue where the algorithm ends up predicting the negative label to minimise the cost function? I understand that the aim isn’t really to use it as a prediction model, but rather to extract the embeddings, but what is the benefit of having an imbalanced class here?


Get this bounty!!!

#StackBounty: #machine-learning #boosting #adaboost Understanding AdaBoost

Bounty: 50

It seems that on every iteration, we increase the weights of misclassified points, so that subsequent classifiers focus on them more. This would imply that these classifiers are somewhat specialized for that region that was misclassified before, however, the weights of the classifiers are not functions of the region they apply to. In other words, how are subsequent classifiers that focus on miclassified points not introducing problems on points that were previously classified correctly? Because they do apply to them as well in the global, overall sum, so how do we make sure we are not moving in circles by breaking earlier correct decisions as we fix wrong decisions? How do we ensure we keep making progress?


Get this bounty!!!

#StackBounty: #machine-learning #method-comparison Taxonomy/overview of machine learning techniques

Bounty: 100

My question: I’m looking for a taxonomy/bestiary/overview of machine learning techniques. I would like to learn 1) how these methods relate to each other, and 2) the relative costs and benefits (and perhaps typical applications) of the different approaches.


Background: I’m trained in statistics, and have a reasonably clear mental map of how a range of these techniques relate to each other. Understanding the costs and benefits of different techniques obviously makes it easier to select the best one to apply in different situations.

I’ve recently used random forests and gradient boosting machines for regression-type problems. I’m unfamiliar with the broader machine learning field but realise that there are a large number of techniques out there; I would like to develop a similar mental map or taxonomy of these. It’s most important for me to understand the highest levels in the technique taxonomy, but I also recognise that there are major lower-level developments in some areas I should be aware of (e.g. neural networks seem to have a huge number of sub-classes).

I’m not looking for in-depth explanations of each method – though references would be great – but instead a framework I can use to focus my learning efforts in a more informed way.

This question focussed on statistical techniques is similar, in that the goal is to understand relationships between methods. But I’m looking for more than a ‘cheat sheet’. I’d like to understand each at least at a basic level, and not just follow a set of rules on a flow chart.

I realise this is a broad question and perhaps not ideal for CV. If I can refine this in some way, or if it is more appropriate on a difference SE site, please let me know.


Get this bounty!!!

#StackBounty: #machine-learning #time-series #distance Alternative distance to Dynamic Time Warping

Bounty: 50

I am performing a comparison among time series by using Dynamic Time Warping (DTW). However, it is not a real distance, but a distance-like quantity, since it doesn’t assure the triangle inequality to hold.

Reminder:d:MxM->R is a distance if for all x,y in M:

1 - d(x,y) ≥ 0, and d(x,y) = 0 if and only if x = y
2 - It is symmetric: d(x,y) = d(y,x)
3 - Triangle inequality: d(x,z) ≤ d(x,y) + d(y,z)

There is any equivalent measure that ensures the condition of distance in a matemathical sense? Obviously, I am not looking for a Euclidean distance, but one that ensures the proper classification of my series in a future clustering.
If so, there is any solid implementation in a R or Python package?


Get this bounty!!!

#StackBounty: #matlab #machine-learning #computer-vision #conv-neural-network #feature-extraction Late fusion for the CNN features

Bounty: 50

I am working on early and late fusion of CNN features. I have taken features from multiple layer of CNN. For the early fusion I have captured the feature of three different layers and then horizontally concatenate them F= [F1' F2' F3']; For the late Fusion I was reading this paper. They have mentioned to do supervised learning twice. But couldn’t understand the way.

For example this is the image taken from the above mentioned paper.
The first image have three different features and for first supervised learning the labels lets say will be 1 in 4 class image set. The output for example is [1 1 3]. Lets say the third classifier has wrong result.
Then my question is then the multimodal feature concatenation is like [1 1 3] with the label 1 lets say for class 1 image?

enter image description here


Get this bounty!!!

#StackBounty: #azure #machine-learning #analytics #azure-machine-learning Predicting a users next action based on current day and time

Bounty: 200

I’m using Microsoft Azure Machine Learning Studio to try an experiment where I use previous analytics captured about a user (at a time, on a day) to try and predict their next action (based on day and time) so that I can adjust the UI accordingly. So if a user normally visits a certain page every Thursday at 1pm, then I would like to predict that behaviour.

Warning – I am a complete novice with ML, but have watched quite a few videos and worked through tutorials like the movie recommendations example.

I have a csv dataset with userid,action,datetime and would like to train a matchbox recommendation model, which, from my research appears to be the best model to use. I can’t see a way to use date/time in the training. The idea being that if I could pass in a userid and the date, then the recommendation model should be able to give me a probably result of what that user is most likely to do.

I get results from the predictive endpoint, but the training endpoint gives the following error:

{
    "error": {
        "code": "ModuleExecutionError",
        "message": "Module execution encountered an error.",
        "details": [
            {
                "code": "18",
                "target": "Train Matchbox Recommender",
                "message": "Error 0018: Training dataset of user-item-rating triples contains invalid data."
            }
        ]
    }
}

Here is a link to a public version of the experiment

Any help would be appreciated.

Thanks.

enter image description here


Get this bounty!!!

#StackBounty: #machine-learning #logistic #predictive-models #application #domain-adaptation Testing the scope of application of a logi…

Bounty: 50

My aim is to assess whether I can apply a logistic regression that was fitted on a sample A (where I have explanatory variables and the outcomes) to a different sample B where I don’t know the outcomes yet.

The measure later when I have outcomes would be discriminatory power.

What statistical approaches can I use here? Is this a problem that is known and studied in statistics? Can we apply methods in the context of dataset shift here?

Can we argue with some sort of similarity between A and B? If we assume that the joint distribution of $(X,y)$ or better the conditional distribution of the features $y$ given the target $X$ is the same on A and B?


Get this bounty!!!

#StackBounty: #machine-learning #logistic #t-test #svm #feature-selection How do I use weight vector of SVM and logistic regression for…

Bounty: 50

I have trained a SVM and logistic regression classifier on my dataset for binary classification. Both classifier provide a weight vector which is of the size of the number of features. I can use this weight vector to select the 10 most important features. For doing that I have turned the weights into t-scores by doing a permutation test. I did 1000 permutations of the class labels and at each permutation I calculated the weight vector. In the end I subtracted the mean of the permuted weights from the real weights and divided by the standard deviation of the permuted weights. So I have now t-scores.

Should I use the absolute values of the t-scores, i.e. selecting the 10 features with the highest absolute values? So let’s say the features have the following t-scores:

feature 1: 1.3
feature 2: -1.7
feature 3: 1.1
feature 4: -0.5

If I select the 2 most important features by considering the highest absolute values, feature 1 and 2 would win. If I consider not the absolute values, feature 1 and 3 would win.

Second, this only works for SVM with linear kernel but not with RBF kernel as I have read. For non-linear kernel the weights are somehow no more linear. What is the exact reason that the weight vector cannot be used to determine the importance of features in case of non-linear kernel SVM?


Get this bounty!!!

#StackBounty: #machine-learning #inference #terminology What is the Difference between Inductive Reasoning and Statistical Inference?

Bounty: 50

In my seminar work I used the following sentence:

Overfitting stands out as the most important aspect of machine
learning and statistics.

Here, I want to replace “statistics” with either Inductive Reasoning or Statistical Inference. The terminology is a bit confusing and I am not sure which one to take.

Could someone clarify the difference (if possible from a machine learning perspective) between the two so I know which one to pick.


Get this bounty!!!

#StackBounty: #machine-learning #python #splines Logistic regression using splines in python

Bounty: 50

I am trying to reproduce the results from chapter 5.2.2 of ESL which is about logistic regression using splines. The dataset is the african heart disease dataset (downloadable from the website following data -> South African Heart Disease data)

I take a shortcut compared to the book since I directly select the relevant data. I do not perform data selection based on AIC criterion.

Here is my code :

import pandas as pd
import patsy as patsy
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.model_selection import train_test_split

SAHeart = pd.read_csv(r"SAheart.data")
SAHeart.drop('row.names',axis = 1, inplace = True)
SAHeart = pd.get_dummies(SAHeart, columns = ['famhist'],drop_first=True) #convert categorial into binary
SAHeart = SAHeart.rename(columns={"famhist_Present": "famhist"})

SAHeart = SAHeart[['sbp','tobacco','ldl','famhist','obesity','age','chd']]

train, test = train_test_split(SAHeart, test_size=0.2)
trainResponse= train.chd

train.drop('chd',axis = 1, inplace = True)
test.drop('chd',axis = 1, inplace = True)

#train.drop('famhist',axis = 1, inplace = True)
#test.drop('famhist',axis = 1, inplace = True)

degFree = 4

data = pd.DataFrame()
for column in train:
    if column == 'famhist':
        y = patsy.dmatrix("bs(train[column], df=1, degree = 1)",
        {"train[column]": train[column]}, return_type='dataframe')
        y.columns = [train[column].name]*y.shape[1]
    if column != 'famhist':
        #start = SAHeart[column].min()
        #end = SAHeart[column].max()
        x = train[column]
        #! knots below = inner knots (=Total - boundary)
        #! for a natural cubic spline, df = number of TOTAL knots (inner + boundary)
        y = patsy.dmatrix("cr(train[column], df=4)",
        {"train[column]": train[column]}, return_type='dataframe')#patsy.cr(x, df=degFree)
        y.columns = [train[column].name]*y.shape[1]
    data = pd.concat([data,y.iloc[:,1:]],axis=1) #
    coeff = 0.1*np.ones((1,4))
        #plt.figure()
        #plt.plot(x,coeff*y

dataTest = pd.DataFrame()
for column in test:
    if column == 'famhist':
        y = patsy.dmatrix("bs(test[column], df=1, degree = 1)",
        {"train[column]": test[column]}, return_type='dataframe')
        y.columns = [test[column].name]*y.shape[1]

    if column != 'famhist':
        #start = SAHeart[column].min()
        #end = SAHeart[column].max()
        x = test[column]

        #! knots below = inner knots (=Total - boundary)
        #! for a natural cubic spline, df = number of TOTAL knots (inner + boundary)
        y = patsy.dmatrix("cr(test[column], df=4)",
                          {"test[column]": test[column]}, return_type='dataframe') #patsy.cr(x, df=degFree)
        y.columns = [test[column].name]*y.shape[1]
    dataTest = pd.concat([dataTest,y.iloc[:,1:]],axis=1)#natural cubic spline df = #knots, don't take intercept

#Add intercept term
data = pd.concat([pd.Series(np.ones((data.shape[0])),index=data.index),data],axis=1)
dataTest = pd.concat([pd.Series(np.ones((dataTest.shape[0])),index=dataTest.index),dataTest],axis=1)

logistic = linear_model.LogisticRegression(C=1.0)
logistic.fit(data, trainResponse)
prediction = logistic.predict(dataTest)

coeffs = logistic.coef_

#plot sbp related data
span = np.linspace(min(train['sbp']),max(train['sbp']),100).reshape(-1,1)
spanSpline = patsy.dmatrix("cr(span, df=4)",
                          {"span": span}, return_type='dataframe') #patsy.cr(span, df=degFree)
plt.figure()
plt.plot(span, np.dot(spanSpline, coeffs[0][0:degFree+1].T),marker=".")

But I don”t get the same figure for f_hat(sbp) as you can see from the picture below (up = my f_hat, below = ESL book f_hat). The picture is obtained by the last lines in the code (i.e. below the “plot sbp related data comment”)

MyAnswer
ESL_answer


Get this bounty!!!