## #StackBounty: #machine-learning #sampling #predictive-models #oversampling How to interpret results of a predictive model when an exter…

### Bounty: 50

I have a prediction task, in which I use DecisionTreeRegressor of scikit-learn to predict a target label, which is about a certain user behaviour in a web platform (and it has a range of 0-4). The features are generated based on users’ other activities in the web platform.

I have separate training and test sets. The training set is from the 2nd week activities, and the test is from the 4th week activities of the users. So, I want to train a model using the 2nd week data, and test it on the 3rd week. In both sets, the target labels are imbalanced. The reason for the imbalance is that users are encouraged to participate at a certain level, which is 3 times. Thus, in both sets there is an accumalation at 3. For example, the number of samples with 3-times participation is 400 whereas the number of users with 1-participation is 65, and the number of users with 0-participation is 55.

To obtain a balanced target labels in the training set, we oversampled it to have equal numbers at each participation level (e.g., 0-participation:250, 1-participation: 250, 2-participation:250, 3-participation:250, 4-participation: 250). Just to explore, splitting the training set into train & test, the prediction results are very good (Mean absolute error is around: 0.20) -See Figure 1.

After we trained the model (using the whole training set), we make predictions on the test set (which is imbalanced itself), the results do not seem to be as promising (Mean absolute error is around: 0.55) -See Figure 2. When I oversample the test set as well, the prediction performance worsens (MAE increases to 0.80) -See Figure 3.

The figures actually tells the story:

Figure 1

Figure 2

Figure 3

At this point I do not know how to proceed. So, I should just go with the results in Figure 2, and discuss the effects of external factors (being required to do 3-times) on user behavior. This is because no matter users have different activity patterns (which were used to generate features), they may just participate on an activity because they are required. I wonder what would be a good approach to understand these results. This is going to be for an academic work.

Get this bounty!!!

## #StackBounty: #machine-learning #neural-networks #lags #state-space-models How to determine appropriate lagged features for learning sy…

### Bounty: 50

In much of machine learning literature, the systems being modelled are instantaneous. Inputs -> outputs, with no notion of impact from past values.

In some systems, inputs from previous time-steps are relevant, e.g. because the system has internal states/storage. For example, in a hydrological model, you have inputs (rain, sun, wind), and outputs (streamflow), but you also have surface- and soil-storage at various depths. In a physically-based model, you might model those states as discrete buckets, with inflow, out-flow, evaporation, leakage, etc. all according to physical laws.

If you want to model streamflow in a purely empirical sense, e.g. with a neural network, you could just create an instantaneous model, and you’d get OK first-approximation results (and actually in land surface modelling, you could easily do better than a physically based model…). But you would be missing a lot of relevant information – stream flow in inherently lagged relative to rainfall, for instance.

One way to get around this would be to include lagged variants of input features. e.g. if your data is hourly, then include rain over the last 2 days, rain over the last month. These inputs do improve model results in my experience, but it’s basically a matter of experience and trial-and-error as to how you chose the appropriate lags. There are a huge array of possible lagged variables to include (straight lagged data, lagged averages, exponential moving windows, etc.; multiple variables, with interactions, and often with high covariances). I guess theoretically a grid-search for the best model is possible, but this would be prohibitively expensive.

I’m wondering a) if there is a reasonable, cheapish, and relatively objective way to select the best lags to include from the almost infinite choices, or b) if there is a better way of representing storage pools in a purely empirical machine-learning model.

Get this bounty!!!

## #StackBounty: #machine-learning #feature-selection #scikit-learn Being able to detect the important features sklearn.make_classificatio…

### Bounty: 50

I am trying to learn about feature selection, and I thought using make_classification in sklearn would be helpful. I’m confused though because the number of informative features I’m able to find isn’t as many as expected.

I am using SelectKBest to determine the number of features, and the ones selected by this (either via chi2 or f_classif) correlates well to which features are useful via training by RandomForestClassifier or any other classifier.

I have been able to determine by adding repeated features, and seeing which ones repeat, that it is the first n features (n = number of intended informative) that are generated by make_classification as being informative.

However in many cases, the number of actually helpful features is less than the intended informative. (I have noticed the number of clusters has an impact.) For instance, n_informative might be 3, but I’m only able to see that one is useful via SelectKBest or actually training a classifier.

So my two questions are:

1.) How can I detect the importance of the features make_classification is intending to be important?

2.) What distinguishes the important features chi2/fclassif are able to detect from the important features they are unable to detect?

The code I am using (output is below):

from sklearn.datasets import make_classification
from sklearn.feature_selection import SelectKBest, chi2
import pandas as pd
import numpy as np

np.random.seed(10)
def illustrate(n_informative, n_clusters_per_class):
data_set = make_classification(n_samples = 500,
n_features = 10,
n_informative = n_informative,
n_redundant=0,
n_repeated=0,
n_classes=2,
n_clusters_per_class = n_clusters_per_class,
weights=None,
flip_y=0.0,
class_sep=1.0,
hypercube=True,
shift=0.0,
scale=1.0,
shuffle = False,
random_state = 6)

X,Y  = pd.DataFrame(data_set[0]), pd.Series(data_set[1],name='class')
X = X + abs(X.min().min())
sel1 = SelectKBest(k=1)
sel1.fit(X,Y)
sel2 = SelectKBest(chi2, k=1)
sel2.fit(X,Y)
res = pd.concat([pd.Series(sel1.scores_,name='f_classif_score'),
pd.Series(sel1.pvalues_,name='f_classif_p_value'),
pd.Series(sel2.scores_, name='chi2_score'),
pd.Series(sel2.pvalues_,name='chi2_pvalue')],
axis=1).sort_values('f_classif_score',ascending=False)
print res

for n_informative in [1,2,3,4]:
for n_clusters_per_class in range(1, n_informative):
print 'Informative Features: {} Clusters Per Class : {}'.format(
n_informative, n_clusters_per_class)
illustrate(n_informative, n_clusters_per_class)


Output of Above Code:

Informative Features: 2 Clusters Per Class : 1
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
0      1016.973810      2.130399e-122  134.325167  4.638173e-31
1       772.724765      2.300631e-103  146.799731  8.679832e-34
5         4.078865       4.395792e-02    1.105015  2.931682e-01
8         1.979141       1.601046e-01    0.554276  4.565756e-01
7         1.374163       2.416583e-01    0.372371  5.417147e-01
3         0.443690       5.056552e-01    0.113065  7.366816e-01
4         0.197154       6.572205e-01    0.060201  8.061782e-01
9         0.186371       6.661408e-01    0.056129  8.127227e-01
6         0.169497       6.807367e-01    0.050526  8.221512e-01
2         0.054381       8.157042e-01    0.016877  8.966354e-01
Informative Features: 3 Clusters Per Class : 1
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
0       687.446137       7.661852e-96  162.798076  2.769074e-37
2       568.414329       2.215744e-84  175.119185  5.638711e-40
9         4.233500       4.015367e-02    1.353756  2.446226e-01
4         2.181651       1.402967e-01    0.649694  4.202221e-01
6         0.416503       5.189845e-01    0.127764  7.207621e-01
5         0.250830       6.167129e-01    0.067124  7.955711e-01
7         0.225946       6.347547e-01    0.068300  7.938284e-01
3         0.210548       6.465381e-01    0.065311  7.982908e-01
8         0.149100       6.995618e-01    0.046806  8.287169e-01
1         0.011565       9.144025e-01    0.003235  9.546456e-01
Informative Features: 3 Clusters Per Class : 2
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
2       812.090540      1.144207e-106  150.031081  1.706735e-34
0       106.629707       8.813981e-23   31.707663  1.792137e-08
7         3.907313       4.862763e-02    1.165847  2.802561e-01
5         1.941582       1.641185e-01    0.634154  4.258357e-01
9         1.456108       2.281233e-01    0.449901  5.023821e-01
6         1.010343       3.153089e-01    0.317138  5.733325e-01
3         0.918498       3.383347e-01    0.278306  5.978138e-01
4         0.892927       3.451437e-01    0.285967  5.928169e-01
1         0.206608       6.496370e-01    0.098889  7.531666e-01
8         0.106946       7.437854e-01    0.029129  8.644814e-01
Informative Features: 4 Clusters Per Class : 1
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
2       823.390874      1.344646e-107  126.561785  2.316755e-29
5         4.964055       2.632530e-02    1.234543  2.665253e-01
4         2.088944       1.489976e-01    0.511490  4.744944e-01
3         2.048932       1.529403e-01    0.812675  3.673306e-01
9         1.234054       2.671562e-01    0.254791  6.137213e-01
1         0.315991       5.742796e-01    0.041092  8.393598e-01
6         0.043817       8.342805e-01    0.010935  9.167180e-01
8         0.033963       8.538599e-01    0.007824  9.295150e-01
7         0.012199       9.120972e-01    0.002627  9.591195e-01
0         0.002108       9.634011e-01    0.000199  9.887401e-01
Informative Features: 4 Clusters Per Class : 2
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
2        59.446089       6.882444e-14   20.306324  6.598215e-06
3        45.413607       4.422173e-11   25.331602  4.827347e-07
6         4.355442       3.739881e-02    0.965005  3.259291e-01
7         2.444909       1.185419e-01    0.490491  4.837084e-01
9         1.508166       2.199992e-01    0.366551  5.448901e-01
5         1.438351       2.309767e-01    0.303560  5.816592e-01
1         0.956231       3.286131e-01    0.176588  6.743222e-01
8         0.886270       3.469467e-01    0.215632  6.423882e-01
4         0.175559       6.753984e-01    0.042743  8.362091e-01
0         0.064596       7.994786e-01    0.025981  8.719465e-01
Informative Features: 4 Clusters Per Class : 3
f_classif_score  f_classif_p_value  chi2_score  chi2_pvalue
0        37.608756       1.762369e-09   15.340979     0.000090
3        35.104866       5.834908e-09   17.716788     0.000026
5         7.474495       6.480748e-03    1.632879     0.201305
8         6.424434       1.156120e-02    1.636956     0.200744
6         0.566897       4.518503e-01    0.130881     0.717521
4         0.225665       6.349655e-01    0.057623     0.810293
7         0.149020       6.996387e-01    0.031846     0.858367
2         0.033591       8.546550e-01    0.015237     0.901759
1         0.028674       8.656032e-01    0.011647     0.914058
9         0.004558       9.461984e-01    0.001164     0.972785
Informative Features: 2 Clusters Per Class : 1
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
0      1016.973810      2.130399e-122  134.325167  4.638173e-31
1       772.724765      2.300631e-103  146.799731  8.679832e-34
5         4.078865       4.395792e-02    1.105015  2.931682e-01
8         1.979141       1.601046e-01    0.554276  4.565756e-01
7         1.374163       2.416583e-01    0.372371  5.417147e-01
3         0.443690       5.056552e-01    0.113065  7.366816e-01
4         0.197154       6.572205e-01    0.060201  8.061782e-01
9         0.186371       6.661408e-01    0.056129  8.127227e-01
6         0.169497       6.807367e-01    0.050526  8.221512e-01
2         0.054381       8.157042e-01    0.016877  8.966354e-01
Informative Features: 3 Clusters Per Class : 1
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
0       687.446137       7.661852e-96  162.798076  2.769074e-37
2       568.414329       2.215744e-84  175.119185  5.638711e-40
9         4.233500       4.015367e-02    1.353756  2.446226e-01
4         2.181651       1.402967e-01    0.649694  4.202221e-01
6         0.416503       5.189845e-01    0.127764  7.207621e-01
5         0.250830       6.167129e-01    0.067124  7.955711e-01
7         0.225946       6.347547e-01    0.068300  7.938284e-01
3         0.210548       6.465381e-01    0.065311  7.982908e-01
8         0.149100       6.995618e-01    0.046806  8.287169e-01
1         0.011565       9.144025e-01    0.003235  9.546456e-01
Informative Features: 3 Clusters Per Class : 2
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
2       812.090540      1.144207e-106  150.031081  1.706735e-34
0       106.629707       8.813981e-23   31.707663  1.792137e-08
7         3.907313       4.862763e-02    1.165847  2.802561e-01
5         1.941582       1.641185e-01    0.634154  4.258357e-01
9         1.456108       2.281233e-01    0.449901  5.023821e-01
6         1.010343       3.153089e-01    0.317138  5.733325e-01
3         0.918498       3.383347e-01    0.278306  5.978138e-01
4         0.892927       3.451437e-01    0.285967  5.928169e-01
1         0.206608       6.496370e-01    0.098889  7.531666e-01
8         0.106946       7.437854e-01    0.029129  8.644814e-01
Informative Features: 4 Clusters Per Class : 1
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
2       823.390874      1.344646e-107  126.561785  2.316755e-29
5         4.964055       2.632530e-02    1.234543  2.665253e-01
4         2.088944       1.489976e-01    0.511490  4.744944e-01
3         2.048932       1.529403e-01    0.812675  3.673306e-01
9         1.234054       2.671562e-01    0.254791  6.137213e-01
1         0.315991       5.742796e-01    0.041092  8.393598e-01
6         0.043817       8.342805e-01    0.010935  9.167180e-01
8         0.033963       8.538599e-01    0.007824  9.295150e-01
7         0.012199       9.120972e-01    0.002627  9.591195e-01
0         0.002108       9.634011e-01    0.000199  9.887401e-01
Informative Features: 4 Clusters Per Class : 2
f_classif_score  f_classif_p_value  chi2_score   chi2_pvalue
2        59.446089       6.882444e-14   20.306324  6.598215e-06
3        45.413607       4.422173e-11   25.331602  4.827347e-07
6         4.355442       3.739881e-02    0.965005  3.259291e-01
7         2.444909       1.185419e-01    0.490491  4.837084e-01
9         1.508166       2.199992e-01    0.366551  5.448901e-01
5         1.438351       2.309767e-01    0.303560  5.816592e-01
1         0.956231       3.286131e-01    0.176588  6.743222e-01
8         0.886270       3.469467e-01    0.215632  6.423882e-01
4         0.175559       6.753984e-01    0.042743  8.362091e-01
0         0.064596       7.994786e-01    0.025981  8.719465e-01
Informative Features: 4 Clusters Per Class : 3
f_classif_score  f_classif_p_value  chi2_score  chi2_pvalue
0        37.608756       1.762369e-09   15.340979     0.000090
3        35.104866       5.834908e-09   17.716788     0.000026
5         7.474495       6.480748e-03    1.632879     0.201305
8         6.424434       1.156120e-02    1.636956     0.200744
6         0.566897       4.518503e-01    0.130881     0.717521
4         0.225665       6.349655e-01    0.057623     0.810293
7         0.149020       6.996387e-01    0.031846     0.858367
2         0.033591       8.546550e-01    0.015237     0.901759
1         0.028674       8.656032e-01    0.011647     0.914058
9         0.004558       9.461984e-01    0.001164     0.972785


Get this bounty!!!

## #HackerRank: Computing the Correlation

### Problem

You are given the scores of N students in three different subjects – MathematicsPhysics and Chemistry; all of which have been graded on a scale of 0 to 100. Your task is to compute the Pearson product-moment correlation coefficient between the scores of different pairs of subjects (Mathematics and Physics, Physics and Chemistry, Mathematics and Chemistry) based on this data. This data is based on the records of the CBSE K-12 Examination – a national school leaving examination in India, for the year 2013.

Pearson product-moment correlation coefficient

This is a measure of linear correlation described well on this Wikipedia page. The formula, in brief, is given by:

where x and y denote the two vectors between which the correlation is to be measured.

Input Format

The first row contains an integer N.
This is followed by N rows containing three tab-space (‘\t’) separated integers, M P C corresponding to a candidate’s scores in Mathematics, Physics and Chemistry respectively.
Each row corresponds to the scores attained by a unique candidate in these three subjects.

Input Constraints

1 <= N <= 5 x 105
0 <= M, P, C <= 100

Output Format

The output should contain three lines, with correlation coefficients computed
and rounded off correct to exactly 2 decimal places.
The first line should contain the correlation coefficient between Mathematics and Physics scores.
The second line should contain the correlation coefficient between Physics and Chemistry scores.
The third line should contain the correlation coefficient between Chemistry and Mathematics scores.

So, your output should look like this (these values are only for explanatory purposes):

0.12
0.13
0.95


Test Cases

There is one sample test case with scores obtained in Mathematics, Physics and Chemistry by 20 students. The hidden test case contains the scores obtained by all the candidates who appeared for the examination and took all three tests (Mathematics, Physics and Chemistry).
Think: How can you efficiently compute the correlation coefficients within the given time constraints, while handling the scores of nearly 400k students?

Sample Input

20
73  72  76
48  67  76
95  92  95
95  95  96
33  59  79
47  58  74
98  95  97
91  94  97
95  84  90
93  83  90
70  70  78
85  79  91
33  67  76
47  73  90
95  87  95
84  86  95
43  63  75
95  92  100
54  80  87
72  76  90


Sample Output

0.89
0.92
0.81


There is no special library support available for this challenge.

## What is the difference between linear regression on y with x and x with y?

The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but that’s the case.

The best way to think about this is to imagine a scatter plot of points with y on the vertical axis and x represented by the horizontal axis. Given this framework, you see a cloud of points, which may be vaguely circular, or may be elongated into an ellipse. What you are trying to do in regression is find what might be called the ‘line of best fit’. However, while this seems straightforward, we need to figure out what we mean by ‘best’, and that means we must define what it would be for a line to be good, or for one line to be better than another, etc. Specifically, we must stipulate a loss function. A loss function gives us a way to say how ‘bad’ something is, and thus, when we minimize that, we make our line as ‘good’ as possible, or find the ‘best’ line.

Traditionally, when we conduct a regression analysis, we find estimates of the slope and intercept so as to minimize the sum of squared errors. These are defined as follows:

In terms of our scatter plot, this means we are minimizing the sum of the vertical distances between the observed data points and the line.

On the other hand, it is perfectly reasonable to regress x onto y, but in that case, we would put x on the vertical axis, and so on. If we kept our plot as is (with x on the horizontal axis), regressing x onto y (again, using a slightly adapted version of the above equation with x and y switched) means that we would be minimizing the sum of the horizontal distances between the observed data points and the line. This sounds very similar, but is not quite the same thing. (The way to recognize this is to do it both ways, and then algebraically convert one set of parameter estimates into the terms of the other. Comparing the first model with the rearranged version of the second model, it becomes easy to see that they are not the same.)

Note that neither way would produce the same line we would intuitively draw if someone handed us a piece of graph paper with points plotted on it. In that case, we would draw a line straight through the center, but minimizing the vertical distance yields a line that is slightly flatter (i.e., with a shallower slope), whereas minimizing the horizontal distance yields a line that is slightly steeper.

A correlation is symmetrical x is as correlated with y as y is with x. The Pearson product-moment correlation can be understood within a regression context, however. The correlation coefficient, r, is the slope of the regression line when both variables have been standardized first. That is, you first subtracted off the mean from each observation, and then divided the differences by the standard deviation. The cloud of data points will now be centered on the origin, and the slope would be the same whether you regressed y onto x, or x onto y.

Now, why does this matter? Using our traditional loss function, we are saying that all of the error is in only one of the variables (viz., y). That is, we are saying that x is measured without error and constitutes the set of values we care about, but that y has sampling error. This is very different from saying the converse. This was important in an interesting historical episode: In the late 70’s and early 80’s in the US, the case was made that there was discrimination against women in the workplace, and this was backed up with regression analyses showing that women with equal backgrounds (e.g., qualifications, experience, etc.) were paid, on average, less than men. Critics (or just people who were extra thorough) reasoned that if this was true, women who were paid equally with men would have to be more highly qualified, but when this was checked, it was found that although the results were ‘significant’ when assessed the one way, they were not ‘significant’ when checked the other way, which threw everyone involved into a tizzy. See here for a famous paper that tried to clear the issue up.

The formula for the slope of a simple regression line is a consequence of the loss function that has been adopted. If you are using the standard Ordinary Least Squares loss function (noted above), you can derive the formula for the slope that you see in every intro textbook. This formula can be presented in various forms; one of which I call the ‘intuitive’ formula for the slope. Consider this form for both the situation where you are regressing y on x, and where you are regressing x on y:

Now, I hope it’s obvious that these would not be the same unless Var(xequals Var(y). If the variances are equal (e.g., because you standardized the variables first), then so are the standard deviations, and thus the variances would both also equal SD(x)SD(y). In this case, β^1 would equal Pearson’s r, which is the same either way by virtue of the principle of commutativity:

Source

## #StackBounty: #machine-learning How to represent time based periodic data for use in ML

### Bounty: 50

I have a dataset of following format –

Number of Machines
CustId   Month0   Month-1   Month-2   Month-3   Month-4
abc      23       26        29        0         0
def      53       26        22        22        12
ghi      11       26        150       120       10

Size of data protected
CustId   Month0   Month-1   Month-2   Month-3   Month-4
abc      23       26        29        0         0
def      53       26        22        22        12
ghi      11       26        150       120       10


The data is Month-over-Month data. For simplicity I have used the same data in both tables, but for a given CustId data will be present in both the tables. Similarly there are tables for other parameters as well.

I want to use machine learning for some classification. What is the best way to serialize this MoM data for different parameters? Is there any standard practice for this?

Get this bounty!!!

## #StackBounty: #machine-learning #neural-networks #conv-neural-network #artificial-intelligence Best ANN Architecture for high-energy ph…

### Bounty: 100

First off, a disclaimer: I’m not sure if this is the right Stack Exchange for this question, but I’m not aware of a machine learning specific SE.

I am doing research into characterising particle jets in high-energy physics. I am trying to use image recognition techniques, in particular convolutional neural networks to characterize jets into two classes.

These classes can be distinguished by the following features:

• Sudden ‘jump’ in the number of hits between layers of a detector
• Radius of concentration of hits
• Energy deposited in each layer

I am using 123x123x4 images. Each pixel in each channel represents a level of energy deposited in a layer of the detector. I am concerned that it may even be impossible to do this in a deep-learning approach, as there are typically only 150-300 pixels filled in each image.

I would like to use a ConvNet to classify the two different types of jet. However, I am not sure what architecture to use.

There are other variables that might be of importance in classification, and I would like to be able to include these also (probably in the dense layer immediately before the output).

              ___________      _________      _________      _________     ________    ______
| Conv    |     | Max    |     | Conv    |     | Max    |    |       |   |     |
Image --> | Layer 1 | --> | Pool 1 | --> | Layer 2 | --> | Pool 2 | -->|       |   |     |
|_________|     |________|     |_________|     |________|    | Dense |   | Out |
| Layer |-->|_____|
Other      ------------------------------------------------------------>|       |
Data                                                                    |       |
|_______|


Are there any suggestions for architectures I should try?

Get this bounty!!!

## #StackBounty: #machine-learning #classification #multinomial #matching Multi-label classification: Predict product category

### Bounty: 50

I want to predict to which product category a product belongs. A total of 400k products need to be translated from the old (less refined) to the new product category tree. (E.g. alarm clock used to fall under ‘Electronics’ and will now belong to ‘Alarm clocks’.) So far 36k products have already been partly allocated to ~400 (out of 800) new product categories. The filling rate ranges from 1% to 95%.

Product data (among others) contains variables: name, description, price, dimensions, color and the old label . The idea was to construct features out of the unstructured variables through tokenisation -> TF-IDF.

Proposed Approach:

1. Train one multi-label prediction model (e.g. Ridge classification + stratified CV) on the labeled data. Then predict the category only for subset that, based on the old product tree, contains all possible products. (e.g. predict if unlabelled ‘Electronics’ products are ‘Alarm clocks’)
2. Based on the predicted probability present the unlabelled product to a content manager that, if labelled, would result in the highest information gain.
3. Propose to which extend the remaining 400 categories should be filled (e.g. 60%) and which products to label first.

What would your preferred approach be?

Get this bounty!!!

## #StackBounty: #machine-learning #mathematical-statistics #optimization How to recommend an attribute value for optimum output

### Bounty: 50

I have set of attributes A (continuous value),B,C and the result is X where X is an continues value. I have data set and I can train a model with that data. At certain point I have to determine the value of A attribute in order to take the optimum X value while other attributes are provided. So I have to recommend value for A attribute to take optimum X value. Can this problem be modeled using recommender systems. So how? If not, what is the correct way of modeling this problem?

Get this bounty!!!