#StackBounty: #sql-server #query-performance #sql-server-2014 #optimization #cardinality-estimates Changing database compatibility from…

Bounty: 50

Just seeking an expert/practical advise from DBA point of view where one of our application DB running on SQL 2014 after migration had old DB compatibility level i.e 100.(SQL2008)

From DEV point of view all the testing has been done and they dont see much diff and want to move to prod based on their testing.

In our testing ,For certain process where we see slowness like in SP’s we found the part of statement that was slow and added query traceon hint , something like below keeping compat to 120, which helps keeping performance stable

SELECT  [AddressID],
    [AddressLine1],
    [AddressLine2]
FROM Person.[Address]
WHERE [StateProvinceID] = 9 AND
    [City] = 'Burbank'
OPTION (QUERYTRACEON 9481);
GO

UPDATE- Editing question based on more findings-

Actually we found things getting worst for a table which calls scalar function within a computed column-

below is how that column looks

CATCH_WAY AS ([dbo].[fn_functionf1]([Col1])) PERSISTED NOT NULL

and part of query where it goes weird is somewhat looking like below

DELETE t2
   OUTPUT del.col1
          del.col2
          del.col3
   INTo #temp1
FROM #temp2 t2
INNER JOIN dbo.table1 tb1 on tb1.CATCH_WAY = ([dbo].[fn_functionf1](t2.[Col1])
AND t2.[col2] = tb1.[col2]
AND t3.[col3] = tb1.[col3]
AND ISNULL (t2.[col4],'') = ISNULL (tb1.[col4],'')

I know function is being called and is slow but the problem is with current compat i.e. 100 runs OK’ish slow but when changed to 120 it gets X100 times slow
What is happening ?


Get this bounty!!!

#StackBounty: #optimization #multiarmed-bandit How is Regret defined for combinatorial optimization problems?

Bounty: 50

I have a combinatorial optimization problem, where I’m trying to find the global minimum (many local minima exist)

In principle, my agent can choose to be anywhere in the state space at any given step, but it doesn’t know where the minimum is so it will have to take some educated guesses (hill-climb).

So far as I understand Regret, it is mean to compute your “deficit” against the best possible steps you could have taken in hindsight. However, it seems to me that in the combinatorial optimization case, the “best” course of action in hindsight is to immediately move to the global minimum, or at least remove all the steps that didn’t make you closer to the global minimum.

This seems to be quite different than how I understand Regret in multi-armed bandits. Is Regret ill-defined in this setting, or is my understanding lacking?


Get this bounty!!!

#StackBounty: #optimization #process-scheduling #planning Scheduling of process manufacturing with setup times

Bounty: 100

Overview

The process manufacturing is (in contrast to discrete manufacturing) focused on the production of continuous goods such as oil. The planning is typically solvable by means of Linear Programming, come constraints can be introduced for MILP.

Problem Formulation

The problem consists of

  • Sequence of consecutive time intervals $tin{1,dots,n_t}$, each with start and end $(s_t,e_t)$ and length $l_t=e_t-s_t$. Consecutive means $e_{t}=s_{t+1}$ for all $tin{1,dots,n_t-1}$.
  • List of type of goods that are being produced: $jin {1,…,n_j}$
  • Demand of each type of good per time interval $d_{j,t}$.
  • List of production lines $iin{1,dots,n_i}$
  • Availability of production lines per time interval $a_{i,t}$. $a_{i,t}$ is binary – whether available or not.
  • Manufacturing speed per production line per type goods $v_{i,j}$.
  • Setup time of production line from one type of goods to another one $u_{i,j,j’}$.
  • Price for using a production line (leasing based), counted per minute $c_{i}$

The goal is to plan the production lines so the demand is covered and the price for leasing is minimal.

Notes:

  • The setup time can be shorter or longer or equal to the length of the intervals
  • It is acceptable that the production line will not work the whole time interval if the supply has been completed sooner
  • The setup to the production of another good can start any time, not necessarily at the beginning of an interval.

Example

There are two production lines, i.e., $n_i = 2$ and there are two types of goods, i.e. $n_j=2$.

We have two intervals, i.e. $n_t=2$, each has a leght of 1 hour. Say one starts at 1 pm, the second at 2 pm.

The demand is:

  • $d_{1,1}=1.1$
  • $d_{1,2}=1$
  • $d_{2,1}=0.5$
  • $d_{2,2}=1$

The of running the production lines are:

  • $c_{1} =c_{2} = 1$ USD/minute

All possible setup times are twelve minutes, i.e.:

  • $u_{i,j,j’}=0.2$ for all $i,j,j’$ where $jneq j’$.

The speeds are:

  • $v_{1,1}=1.1$
  • $v_{1,2}=1.5$
  • $v_{1,1}=1$
  • $v_{1,1}=1$

Obviously, the demand is met for a total cost of $4$ if the first line is producing the first type of goods at both intervals and the second line is producing the second type.

However, it might be tempting to switch them after the first interval. If there would be no setup time needed, the cost would be $1+1+1+0.5/1.5=3.33$ which is better. However, this is not possible because of the setup time of the second production line.

Question

What is the algorithm to schedule this manufacturing process optimally?

An answer is welcome even if it would outline the way and approach (MILP, SAT, CSP,…).

Ideas fo far

  • If the length of intervals would be fixed, say 1 hour and the setup time would be defined in terms of these units, say 2 hours. Then, it might be solvable by SAT/CSP.
  • An idea is to use an evolutionary algorithm that would: consist of a sequence of activities with mutations (add an activity, delete activity, prolong activity) and crossover (mix two plans in a random way).


Get this bounty!!!

#StackBounty: #java #algorithm #optimization #data-structures #greedy Gridland Metro HackerRank

Bounty: 50

I was solving this question on hackerRank. I have gone through the whole discssions section, tried all the suggested test cases with expected results. I think I might be doing some silly code mistake as I’m sure I’ve thought/considered every scenario in the implementation. could you please help me out pointing if there is any mistake in my code.

public static void main(String[] args) throws IOException {
        BufferedReader bf = new BufferedReader(new InputStreamReader(System.in));
        String strNum[] = bf.readLine().split("\s");
        double n = Double.parseDouble(strNum[0]);
        double m = Double.parseDouble(strNum[1]);
        double k = Double.parseDouble(strNum[2]);

        Map<Double, TreeMap<Double, Double>> map = new HashMap<>();
        while (k > 0) {
            strNum = bf.readLine().split("\s");
            double r = Double.parseDouble(strNum[0]);
            double c1 = Double.parseDouble(strNum[1]);
            double c2 = Double.parseDouble(strNum[2]);
            TreeMap<Double, Double> innerMap = map.get(r);
            if (innerMap != null) {
                innerMap.put(c1, c2);
            } else {
                innerMap = new TreeMap<Double, Double>();
                innerMap.put(c1, c2);
                map.put(r, innerMap);
            }
            k--;
        }
        double count = (n - map.size()) * m;
        for (Map.Entry<Double, TreeMap<Double, Double>> e : map.entrySet()) {
            TreeMap<Double, Double> innerMap = e.getValue();
            double start = innerMap.firstKey();
            double end = innerMap.firstEntry().getValue();
            for (Map.Entry<Double, Double> e2 : innerMap.entrySet()) {
                double x = e2.getKey();
                double y = e2.getValue();
                if (y > end) {
                    if (x > end) {
                        count += ((x - end) - 1);
                    }
                    end = y;
                }

            }
            count += (m - (end - start + 1));
        }
        System.out.println(String.format("%.0f", count));
    }

25/31 test cases are failing. Any help is much appreciated.


Get this bounty!!!

#StackBounty: #r #optimization Best approach to find optimal solution to linear equation by group in R

Bounty: 50

I am currently modeling a pricing and discount system in R.

My data frame looks as follows:

df = structure(
  list(
    Customers = structure(
      c(1L, 1L, 1L, 2L, 2L, 2L),
      .Label = c("A", "B"),
      class = "factor"
    ),
    Products = structure(
      c(1L,
        2L, 3L, 1L, 2L, 3L),
      .Label = c("P1", "P2", "P3"),
      class = "factor"
    ),
    Old_Price = c(5, 2, 10, 7, 4, 8),
    New_Price = c(6, 3, 9,
                  6, 3, 9)
  ),
  class = "data.frame",
  row.names = c(NA,-6L)
)

There are several customers who buy different products with an “Old Price” and a “New Price”. I now want to identify one discount parameter (a real from -1.0 to 1.0) for each customer that minimizes the difference of Old Price and New Price.

Because I do not know a lot about optimizations, etc. my current approach would be to do the following, which seems horribly inefficient and might not lead to the optimal solution anyway:

df %>%
    mutate(Individual_Discount = (Old_Price-New_Price)/New_Price) %>% # Identify optimal discount individually
    group_by(Customers) %>%
    mutate(Optimal_Discount = mean(Individual_Discount)) # Average individual discount to get approximate discount for customer

What’s the best approach to solve a case like this and how can I implement it in R?


Get this bounty!!!

#StackBounty: #optimization #bayesian-optimization #recursive-model Optimization problem with recursive functions

Bounty: 150

I could image that the following is a standard optimization problem but nevertheless I have no clue how to specifically solve it: by which specific approach, algorithm, and which computing powers I would need?

Please let me state the problem, then every hint how to approach it is welcome!

Given a recursive formula, that gives the value of a function $f$ at time $t+1$ as a function $F$ of its values at all previous times $t, t-1,dots,0$ and of a big number $n$ (say $n =50$) of parameters $a_i in [0,1]$, which may vary over time:

$$f(0) = f_0$$
$$f(t+1) = Fbig(f(t-1), f(t-1),dots, f(0); a_1(t),a_2(t),dots,a_n(t)big)$$

Given also a cumulative cost function $c$ with

$$c(0) = 0$$

$$c(t+1) = c(t) + Cbig(a_1(t),a_2(t),dots,a_n(t)big)$$

Finally an explicit threshold function $vartheta(t)$.

The problem is (for a given time $T$):

For which choice of parameters $a_1(t),a_2(t),dots,a_n(t)$, $t < T$
does hold

  • $f(t) leq vartheta(t)$ for all $t leq T$

  • $c(T)$ is minimal with value $c_{text{min}}$

I.e. for any other choice of parameters, either $f(t) > vartheta(t)$ for some $t leq T$ or $c(T) > c_{text{min}}$.

I would be more than happy not only with really optimal solutions, but also with almost optimal ones (given high probability, that they are nearly optimal).


Maybe the problem becomes significantly easier to solve, when we don’t try to minimize $c(T)$ but fix the minimal costs $c_{text{min}}$ and look for solutions (= choices of parameters) with

  • $f(t) leq vartheta(t)$ for all $t leq T$
  • $c(T) leq c_{text{min}}$


Edit: Maybe the problem becomes easier when taking into account, that $F$ in fact only depends on $f(t)$ and $f(t-1)$ and $sum_{tau = 0}^t f(tau)$.


Get this bounty!!!

#StackBounty: #xgboost #machine-learning-model #optimization #objective-function Optimising for Brier objective function directly gives…

Bounty: 50

I am training an XGBoost model and as I care the most about resulting probabilities, not classification itself I have chosen Brier score as a metric for my model, so that probabilities would be well calibrated. I tuned my hyperparameters using GridSearchCV and brier_score_loss as a metric. Here’s an example of a tuning step:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=0)
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)

model = XGBClassifier(learning_rate=0.1, n_estimators=200, gamma=0, subsample=0.8, colsample_bytree=0.8, scale_pos_weight=1, verbosity=1, seed=0)
parameters = {'max_depth': [3, 5, 7], 
              'min_child_weight': [1, 3, 5]}
gs = GridSearchCV(model, parameters, scoring='brier_score_loss', n_jobs=1, cv=cv)
gs_results = gs.fit(X_train, y_train)

Finally, I train my main model with chosen hyperparameters on two ways:

optimising for custom objective – brier, using custom brier_error function as a metric

model = XGBClassifier(objective=brier, learning_rate=0.02, n_estimators=2000, max_depth=5, 
                      min_child_weight=1, gamma=0.3, reg_lambda=20, subsample=1, colsample_bytree=0.6, 
                          scale_pos_weight=1, seed=0, disable_default_eval_metric=1)
model1.fit(X_train, y_train, eval_metric=brier_error, eval_set=[(X_train, y_train), (X_test, y_train)],
          early_stopping_rounds=100)
y_proba1 = model1.predict_proba(X_test)[:, 1]
brier_score_loss(y_test, y_proba1) # 0.005439
roc_auc_score(y_test, y_proba1) # 0.8567

optimising for default binary:logistic and auc as an evaluation metric

model2 = XGBClassifier(learning_rate=0.02, n_estimators=2000, max_depth=5, 
                      min_child_weight=1, gamma=0.3, reg_lambda=20, subsample=1, colsample_bytree=0.6, 
                          scale_pos_weight=1, seed=0, disable_default_eval_metric=1)
model2.fit(X_train, y_train, eval_metric='auc', eval_set=[(X_train, y_train), (X_test, y_train)],
          early_stopping_rounds=100)
y_proba2 = model2.predict_proba(X_test)[:, 1]
brier_score_loss(y_test, y_proba2) # 0.004914
roc_auc_score(y_test, y_proba2) # 0.8721

I would expect Brier score to be lower in the model1 since we optimise directly for it, but apparently it is not the case (see results above). What does it tell me? Does optimising brier is somehow harder? Should I use more boosting rounds? (Although this was found using grid search with brier_score_loss…) Is it explainable somehow but data distribution? (e.g. such an issue can occur in the event of unbalanced classes or something like that?) I have no idea where does that situation come from, but probably there is a reason behind that.


Get this bounty!!!

#StackBounty: #algorithms #optimization #dynamic-programming How can this Line-Breaking algorithm consider spaces as having width diffe…

Bounty: 50

The Divide & Conquer Algorithm for Line-Breaking described here is given below, both in Python and in Dart (which is similar to Java/C#).

Line-breaking is also known as “line wrap”, “word wrap”, or “paragraph formation”, and this algorithm is used for achieving minimum raggedness.

This algorithm works, but it considers each space as having exactly width = 1.0 .

My Question:

How can I modify this algorithm so that it ignores spaces? In other words, make it consider spaces as having width 0.0? (or it would also work for me if I could define any width I wanted for the spaces, including 0.0).

Python Implementation:

def divide(text, width):
    words = text.split()
    count = len(words)
    offsets = [0]
    for w in words:
        offsets.append(offsets[-1] + len(w))

    minima = [0] + [10 ** 20] * count
    breaks = [0] * (count + 1)

    def cost(i, j):
        w = offsets[j] - offsets[i] + j - i - 1
        if w > width:
            return 10 ** 10
        return minima[i] + (width - w) ** 2

    def search(i0, j0, i1, j1):
        stack = [(i0, j0, i1, j1)]
        while stack:
            i0, j0, i1, j1 = stack.pop()
            if j0 < j1:
                j = (j0 + j1) // 2
                for i in range(i0, i1):
                    c = cost(i, j)
                    if c <= minima[j]:
                        minima[j] = c
                        breaks[j] = i
                stack.append((breaks[j], j+1, i1, j1))
                stack.append((i0, j0, breaks[j]+1, j))

    n = count + 1
    i = 0
    offset = 0
    while True:
        r = min(n, 2 ** (i + 1))
        edge = 2 ** i + offset
        search(0 + offset, edge, edge, r + offset)
        x = minima[r - 1 + offset]
        for j in range(2 ** i, r - 1):
            y = cost(j + offset, r - 1 + offset)
            if y <= x:
                n -= j
                i = 0
                offset += j
                break
        else:
            if r == n:
                break
            i = i + 1

    lines = []
    j = count
    while j > 0:
        i = breaks[j]
        lines.append(' '.join(words[i:j]))
        j = i
    lines.reverse()
    return lines

Dart implementation:

class MinimumRaggedness {

  /// Given some [boxWidths], break it into the smallest possible number
  /// of lines such as each line has width not larger than [maxWidth].
  /// It also minimizes the difference between width of each line,
  /// achieving a "balanced" result.
  /// Spacing between boxes is 1.0.
  static List<List<int>> divide(List<num> boxWidths, num maxWidth) {

    int count = boxWidths.length;
    List<num> offsets = [0];

    for (num boxWidth in boxWidths) {
      offsets.add(offsets.last + min(boxWidth, maxWidth));
    }

    List<num> minimum = [0]..addAll(List<num>.filled(count, 9223372036854775807));
    List<int> breaks = List<int>.filled(count + 1, 0);

    num cost(int i, int j) {
      num width = offsets[j] - offsets[i] + j - i - 1;
      if (width > maxWidth)
        return 9223372036854775806;
      else
        return minimum[i] + pow(maxWidth - width, 2);
    }

    void search(int i0, int j0, int i1, int j1) {
      Queue<List<int>> stack = Queue()..add([i0, j0, i1, j1]);

      while (stack.isNotEmpty) {
        List<int> info = stack.removeLast();
        i0 = info[0];
        j0 = info[1];
        i1 = info[2];
        j1 = info[3];

        if (j0 < j1) {
          int j = (j0 + j1) ~/ 2;

          for (int i = i0; i < i1; i++) {
            num c = cost(i, j);
            if (c <= minimum[j]) {
              minimum[j] = c;
              breaks[j] = i;
            }
          }

          stack.add([breaks[j], j + 1, i1, j1]);
          stack.add([i0, j0, breaks[j] + 1, j]);
        }
      }
    }

    int n = count + 1;
    int i = 0;
    int offset = 0;

    while (true) {
      int r = min(n, pow(2, i + 1));
      int edge = pow(2, i) + offset;
      search(0 + offset, edge, edge, r + offset);
      num x = minimum[r - 1 + offset];

      bool flag = true;
      for (int j = pow(2, i); j < r - 1; j++) {
        num y = cost(j + offset, r - 1 + offset);
        if (y <= x) {
          n -= j;
          i = 0;
          offset += j;
          flag = false;
          break;
        }
      }

      if (flag) {
        if (r == n) break;
        i = i + 1;
      }
    }

    int j = count;

    List<List<int>> indexes = [];

    while (j > 0) {
      int i = breaks[j];
      indexes.add(List<int>.generate(j - i, (index) => index + i));
      j = i;
    }

    return indexes.reversed.toList();
  }
}


Get this bounty!!!

#StackBounty: #sampling #optimization #monte-carlo #minimum #numerical-integration Minimize asymptotic variance of fintely many estimates

Bounty: 50

Let

  • $(E,mathcal E,lambda)$ be a $sigma$-finite measure space;
  • $f:Eto[0,infty)^3$ be a bounded Bochner integrable function on $(E,mathcal E,lambda)$ and $p:=alpha_1f_1+alpha_2f_2+alpha_3f_3$ for some $alpha_1,alpha_2,alpha_3ge0$ with $alpha_1+alpha_2+alpha_3=1$ and $$c:=int p:{rm d}lambdain(0,infty)tag1$$
  • $mu:=plambda$
  • $Isubseteq$ be a finite nonempty set
  • $r_i$ be a probability density on $(E,mathcal E,lambda)$ with $$E_1:={p>0}subseteq{r_i>0}tag2$$ for $iin I$
  • $w_i:Eto[0,1]$ be $mathcal E$-measurable for $iin I$ with $${p>0}subseteqleft{sum_{iin I}w_i=1right}tag3$$

I’m running the Metropolis-Hastings algorithm with target distribution $mu$ and use the generated chain to estimate $lambda g$ for $mathcal E$-measurable $g:Eto[0,infty)^3$ with ${p=0}subseteq{g=0}$ and $lambda|g|<infty$. The asymptotic variance of my estimate is given by $$sigma^2(g)=sum_{iin I}(mu w_i)int_{E_1}frac{left|g-frac pclambda gright|^2}{r_i}:{rm d}lambda.tag4$$ Given a finite system $Hsubseteqmathcal E$ with $lambda(H)in(0,infty)$ for all $Hinmathcal H$, I want to choose the $w_i$ such that $$max_{Hinmathcal H}sigma^2(1_Hf)tag5$$ is as small as possible.

It’s easy to see that, for fixed $g$, $(3)$ is minimized by choosing $w_iequiv 1$ for the $iin I$ which minimizes $int_{E_1}frac{left|g-frac pclambda gright|^2}{r_i}:{rm d}lambda$ and $w_jequiv 0$ for all other $jin Isetminus{i}$.

So, my idea is to bound $(5)$ by something which doesn’t depend on $mathcal H$ anymore. Maybe we can bound it by the “variance” of $f$ using the fact that the $operatorname E[X]$ is minimizing $xmapstooperatorname Eleft[left|x-Xright|right]$.

I’m not sure if this is sensible, but maybe it’s easier to consider $sigma^2(g_H)$ instead of $sigma^2(1_Hf)$, where $g_H:=1_Hleft(f-frac{lambda(1_Hf)}{lambda(H)}right)$. By the aforementioned idea we obtain $$sigma^2(g_H)leint_{E_1}frac{|f|^2}{r_i}:{rm d}lambda-frac1{theta_i}left|int_{E_1}frac f{r_i}:{rm d}lambdaright|^2tag5,$$ where $theta_i=int_{E_1}frac1{r_i}:{rm d}lambda$. Now we could estimate the right-hand side of $(5)$ using Monte Carlo integration and compute the index $i$ for which this estimate is minimal. Is this a good idea or can we do better?


Get this bounty!!!

#StackBounty: #mysql #optimization #explain MYSQL Select query stuck at "Sending data"

Bounty: 50

I have an select query with about 1M records, I’m working on Magento 1.9 database.

    SELECT IF(sup_ap.is_percent = 1, TRUNCATE(mt.value + (mt.value * sup_ap.pricing_value / 100), 4),
          mt.value + SUM(sup_ap.pricing_value)) AS `value`,
       75                                       AS `attribute_id`,
       `supl`.`product_id`                      AS `entity_id`,
       `cs`.`store_id`
FROM `catalog_product_entity_decimal` AS `mt`
         LEFT JOIN `catalog_product_super_attribute` AS `sup_a` ON mt.entity_id = product_id
         INNER JOIN `catalog_product_super_attribute_pricing` AS `sup_ap`
                    ON sup_ap.product_super_attribute_id = sup_a.product_super_attribute_id
         INNER JOIN `catalog_product_super_link` AS `supl` ON mt.entity_id = supl.parent_id
         INNER JOIN `catalog_product_entity_int` AS `pint`
                    ON pint.entity_id = supl.product_id and pint.attribute_id = sup_a.attribute_id and
                       pint.value = sup_ap.value_index
         INNER JOIN `core_store` AS `cs` ON cs.website_id = sup_ap.website_id
WHERE (mt.entity_id in (select product_id from catalog_product_super_attribute))
  AND (mt.attribute_id = '75')
GROUP BY `entity_id`, `cs`.`store_id`
LIMIT 500

My Explain:

+------+-------------+---------------------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+---------+------------------------------------+------+----------------------------------------------+
| id   | select_type | table                           | type   | possible_keys                                                                                                                                                  | key                                                            | key_len | ref                                | rows | Extra                                        |
+------+-------------+---------------------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+---------+------------------------------------+------+----------------------------------------------+
|    1 | PRIMARY     | cs                              | index  | IDX_CORE_STORE_WEBSITE_ID                                                                                                                                      | IDX_CORE_STORE_WEBSITE_ID                                      | 2       | NULL                               |    7 | Using index; Using temporary; Using filesort |
|    1 | PRIMARY     | sup_ap                          | ref    | UNQ_CAT_PRD_SPR_ATTR_PRICING_PRD_SPR_ATTR_ID_VAL_IDX_WS_ID,IDX_CAT_PRD_SPR_ATTR_PRICING_PRD_SPR_ATTR_ID,IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRICING_WEBSITE_ID | IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRICING_WEBSITE_ID         | 2       | cs.website_id                      |   11 |                                              |
|    1 | PRIMARY     | sup_a                           | eq_ref | PRIMARY,UNQ_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID_ATTRIBUTE_ID,IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID                                             | PRIMARY                                                        | 4       | sup_ap.product_super_attribute_id  |    1 |                                              |
|    1 | PRIMARY     | mt                              | ref    | UNQ_CAT_PRD_ENTT_DEC_ENTT_ID_ATTR_ID_STORE_ID,IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ENTITY_ID,IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ATTRIBUTE_ID                     | UNQ_CAT_PRD_ENTT_DEC_ENTT_ID_ATTR_ID_STORE_ID                  | 6       | sup_a.product_id,const             |    1 |                                              |
|    1 | PRIMARY     | catalog_product_super_attribute | ref    | UNQ_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID_ATTRIBUTE_ID,IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID                                                     | UNQ_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID_ATTRIBUTE_ID    | 4       | sup_a.product_id                   |    1 | Using index; FirstMatch(mt)                  |
|    1 | PRIMARY     | supl                            | ref    | UNQ_CATALOG_PRODUCT_SUPER_LINK_PRODUCT_ID_PARENT_ID,IDX_CATALOG_PRODUCT_SUPER_LINK_PARENT_ID,IDX_CATALOG_PRODUCT_SUPER_LINK_PRODUCT_ID                         | IDX_CATALOG_PRODUCT_SUPER_LINK_PARENT_ID                       | 4       | sup_a.product_id                   |    4 |                                              |
|    1 | PRIMARY     | pint                            | ref    | UNQ_CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,IDX_CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,IDX_CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID            | UNQ_CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 6       | supl.product_id,sup_a.attribute_id |    1 | Using where                                  |
+------+-------------+---------------------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+---------+------------------------------------+------+----------------------------------------------+

I have no experience about optimize the select query with stuck at sending data, I tried update the select query to this:

SELECT IF(sup_ap.is_percent = 1, TRUNCATE(mt.value + (mt.value * sup_ap.pricing_value / 100), 4),
          mt.value + SUM(sup_ap.pricing_value)) AS `value`,
       75                                       AS `attribute_id`,
       `supl`.`product_id`                      AS `entity_id`,
       `cs`.`store_id`
FROM (select entity_id, `value` from `catalog_product_entity_decimal` where attribute_id = '75') AS `mt`
         LEFT JOIN `catalog_product_super_attribute` AS `sup_a` ON mt.entity_id = product_id
         INNER JOIN `catalog_product_super_attribute_pricing` AS `sup_ap`
                    ON sup_ap.product_super_attribute_id = sup_a.product_super_attribute_id
         INNER JOIN `catalog_product_super_link` AS `supl` ON mt.entity_id = supl.parent_id
         INNER JOIN `catalog_product_entity_int` AS `pint`
                    ON pint.entity_id = supl.product_id and pint.attribute_id = sup_a.attribute_id and
                       pint.value = sup_ap.value_index
         INNER JOIN `core_store` AS `cs` ON cs.website_id = sup_ap.website_id
WHERE (sup_a.product_id is not null)
GROUP BY `entity_id`, `cs`.`store_id`
LIMIT 500;

New Explain:

+------+-------------+--------------------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+---------+------------------------------------+------+----------------------------------------------+
| id   | select_type | table                          | type   | possible_keys                                                                                                                                                  | key                                                            | key_len | ref                                | rows | Extra                                        |
+------+-------------+--------------------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+---------+------------------------------------+------+----------------------------------------------+
|    1 | SIMPLE      | cs                             | index  | IDX_CORE_STORE_WEBSITE_ID                                                                                                                                      | IDX_CORE_STORE_WEBSITE_ID                                      | 2       | NULL                               |    7 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | sup_ap                         | ref    | UNQ_CAT_PRD_SPR_ATTR_PRICING_PRD_SPR_ATTR_ID_VAL_IDX_WS_ID,IDX_CAT_PRD_SPR_ATTR_PRICING_PRD_SPR_ATTR_ID,IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRICING_WEBSITE_ID | IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRICING_WEBSITE_ID         | 2       | cs.website_id                      |   11 |                                              |
|    1 | SIMPLE      | sup_a                          | eq_ref | PRIMARY,UNQ_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID_ATTRIBUTE_ID,IDX_CATALOG_PRODUCT_SUPER_ATTRIBUTE_PRODUCT_ID                                             | PRIMARY                                                        | 4       | sup_ap.product_super_attribute_id  |    1 | Using where                                  |
|    1 | SIMPLE      | catalog_product_entity_decimal | ref    | UNQ_CAT_PRD_ENTT_DEC_ENTT_ID_ATTR_ID_STORE_ID,IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ENTITY_ID,IDX_CATALOG_PRODUCT_ENTITY_DECIMAL_ATTRIBUTE_ID                     | UNQ_CAT_PRD_ENTT_DEC_ENTT_ID_ATTR_ID_STORE_ID                  | 6       | sup_a.product_id,const             |    1 |                                              |
|    1 | SIMPLE      | supl                           | ref    | UNQ_CATALOG_PRODUCT_SUPER_LINK_PRODUCT_ID_PARENT_ID,IDX_CATALOG_PRODUCT_SUPER_LINK_PARENT_ID,IDX_CATALOG_PRODUCT_SUPER_LINK_PRODUCT_ID                         | IDX_CATALOG_PRODUCT_SUPER_LINK_PARENT_ID                       | 4       | sup_a.product_id                   |    4 |                                              |
|    1 | SIMPLE      | pint                           | ref    | UNQ_CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,IDX_CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,IDX_CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID            | UNQ_CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 6       | supl.product_id,sup_a.attribute_id |    1 | Using where                                  |
+------+-------------+--------------------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------+---------+------------------------------------+------+----------------------------------------------+

Extremely grateful that you could take a moment to consider keep an eye on the issue above.


Get this bounty!!!