#StackBounty: #neural-network #deep-learning #mathematics How propagate the error delta in backpropagation in convolutional neural netw…

Bounty: 50

My CNN has the following structure:

  • Output neurons: 10
  • Input matrix (I): 28×28
  • Convolutional layer (C): 3
    feature maps with a 5×5 kernel (output dimension is 3x24x24)
  • Max pooling layer (MP): size 2×2 (ouput dimension is 3x12x12)
  • Fully connected layer (FC): 432×10 (3*12*12=432 max pooling layer flattened and vectorized)

After making the forward pass, I calculate the error delta in the output layer as:

$delta^L = (a^L-y) odot sigma'(z^L) (1)$

Being $a^L$ the predicted value and $z^L$ the dot product of the weights, plus the biases.

I calculate the error deltas for the next layers with:

$delta^l = ((w^{l+1})^T delta^{l+1}) odot sigma'(z^l) (2)$

And derivative of the error w.r.t. the weights being

$frac{partial C}{partial w^l_{jk}} = a^{l-1}_k delta^l_j (3)$

I’m able to update the weights (and biases) of $FC$ with no problem. At this point, error delta $delta$ is 10×1.

For calculating the error delta for $MP$ , I find the dot product of $FC$ and the error delta itself, as defined in equation 2. That gives me an error delta of 432×1. Because there are no parameters in this layer, and the flattening and vectorization, I just need to follow the reverse process and reshape it to 3x12x12, being that the error in $MP$.

To find the error delta for $C$, I upsample the error delta following the reverse process of the max pooling ending with a 3x24x24 delta. Finding the hadamard product of each of those matrixes with each of the $σ′$ of the feature maps gives me the error delta for $C$.

But now, how am I supposed to update the kernels, if they’re 5×5, and I is 28×28? $I$ have the error delta for the layer, but I don’t know how to update the weights with it. Also for the bias, as it’s a single value for the whole feature set.

Get this bounty!!!

#StackBounty: #mathematics #tiles Octagon border algorithm

Bounty: 50

I work on an open source game since 2 years and I’m very bad at math (it is not every time easy haha). My game permit to move a character on octagon. When character reach border coordinate (colored in yellow), I permit him to travel on a “new octagon”:

enter image description here

So, the algorithm goal is to know if an x, y tile coordinate is on a “yellow” tile, which direction is (North or North-Est or Est …) depending on map width and height.

I wrote this algorithm many times and in two different language, example with Rust:

pub fn get_corner(width: i16, height: i16, new_row_i: i16, new_col_i: i16) -> Option<CornerEnum> {
    let left_col_i_end = width / 3;
    let right_col_i_start = (width / 3) * 2;
    let top_row_i_end = height / 3;
    let bottom_row_i_start = (height / 3) * 2;
    let mut more = if new_row_i >= 0 { new_row_i } else { 0 };
    let mut right_col_i = 0;
    let mut left_col_i = 0;

    if new_row_i < top_row_i_end {
        right_col_i = right_col_i_start + more;
        left_col_i = left_col_i_end - more;
    } else {
        if new_row_i >= bottom_row_i_start {
            more = (height / 3) - (new_row_i - bottom_row_i_start + 1);
            more = if more >= 0 { more } else { 0 };
            right_col_i = right_col_i_start + more;
            left_col_i = left_col_i_end - more;
        } else {
            left_col_i = left_col_i_end;
            right_col_i = right_col_i_start;

    if new_col_i < left_col_i && new_row_i < top_row_i_end {
        return Some(CornerEnum::TopLeft);
    if new_row_i < 0 && left_col_i <= new_col_i {
        return Some(CornerEnum::Top);
    if new_col_i >= right_col_i && new_row_i < top_row_i_end {
        return Some(CornerEnum::TopRight);
    if new_col_i > width - 1 && top_row_i_end <= new_row_i {
        return Some(CornerEnum::Right);
    if new_col_i >= right_col_i && new_row_i >= bottom_row_i_start {
        return Some(CornerEnum::BottomRight);
    if new_row_i > height - 1 && left_col_i_end <= new_col_i {
        return Some(CornerEnum::Bottom);
    if new_col_i < left_col_i && new_row_i >= bottom_row_i_start {
        return Some(CornerEnum::BottomLeft);
    if new_col_i < 0 && top_row_i_end <= new_row_i {
        return Some(CornerEnum::Left);


But it is not working well … I curse my math. I’m sure it’s not that complicated but i fail at each time in two years. So, i’m here to ask help, or for solution. That would be greatly appreciated!

Get this bounty!!!

#StackBounty: #unity #2d #mathematics #movement #performance Improve velocity obstacle calculation algorithm/performance

Bounty: 50

My goal

My goal is to calculate the (velocity) obstacle that is imposed by unit B onto unit A.
So I want to calculate the velocities from the center of unit A (circle) that will lead to a collision with unit B assuming unit B is not moving. The illegal velocities would be all velocities that are between the two tangents from the center of unit A to unit B (circle).
Thus I actually only need the angle of the two tangents relative to some other velocity.

Here is a simple drawing to illustrate my approach:
enter image description here

The “other vector” (=green) is the “optimal velocity” for unit A, that is calculated through higher level pathfinding. My goal is then to choose the velocity that is closest to the optimal velocity, but not within the blocked/illegal velocities.

My approach

I already have a working demo, but it is too slow.

Note: In my implementation I extruded the radius of unit B by the radius of unit A.

My (primitive) approach to calculating the angles is:
1. Calculate the line from the center of unit A to Unit B
2. Calculate the first tangent point using trigonometry
3. Calculate the closest point from the first tangent point to the line from (1) (i call it the mirrorPoint)
4. Mirror the first tangent point along the mirrorPoint
5. Calculate angle1 and angle2 using Vector.SignedAngle

After wards i still have to do some work to find the closest legal vector to the optimal velocity, but that is not too time consuming.

Here is an image taken from the editor:
enter image description here

The purple line is (1)
The green lines are the lines from the center of Unit B to the tangent points.
The black line (in unit B) are the lines from the mirrorPoint to the tangent points
The blue and the red line are the lines from the center of unit A to the tangent points
The orange line is the optimal velocity
The black line (starting at the center of Unit A) is the legal velocity that is closest to optimal velocity

Here is the code that i use to calculate the angles:

public Vector2 GetDesiredSteering(VelocityObstacleSteeringInput input)
        TestUnit unit = input.Unit;
        List<TestUnit> neighbors = GetNeighbors(unit);
        foreach (TestUnit neighbor in neighbors)
            Vector2 tangent1;
            Vector2 tangent2;
            CalculateTangent(unit.Pos, unit.Scale.x, neighbor.Pos, neighbor.Scale.x, out tangent1, out tangent2);

            Vector2 relativeTangent1 = tangent1 - unit.Pos;
            Vector2 relativeTangent2 = tangent2 - unit.Pos;
            float angle1 = Vector2.SignedAngle(relativeTangent1, input.OptimalVelocity);
            float angle2 = Vector2.SignedAngle(relativeTangent2, input.OptimalVelocity);
            Debug.Log("angle1: " + angle1 + " angle2: " + angle2);

        //return dummy for this stripped down version
        return Vector2.up;

 bool CalculateTangent(Vector2 unitPos, float unitRadius, Vector2 otherPos, float otherRadius, out Vector2 tangentPoint1, out Vector2 tangentPoint2)
        Vector2 hypo = unitPos - otherPos;
        float hypoLen = hypo.magnitude;
        float combinedRadius = (unitRadius + otherRadius) / 2.0f;
        float ratio = combinedRadius / hypoLen;
        float alpha = Mathf.Acos(ratio);
        Vector2 rotated = GeometryHelper.RotateRad(hypo, -alpha);
        Vector2 centerToTangent = rotated * combinedRadius;
        tangentPoint1 = otherPos + centerToTangent;
        tangentPoint2 = GeometryHelper.CenterToOtherTangentPoint(otherPos, hypo, tangentPoint1, out Vector2 curMirrorPoint);
        return true;

 public static Vector2 RotateRad(Vector2 aPoint, float rad)
        float s = Mathf.Sin(rad);
        float c = Mathf.Cos(rad);
        return new Vector2(
            aPoint.x * c - aPoint.y * s,
            aPoint.y * c + aPoint.x * s);

//linePointhas to be center of unit
 public static Vector2 CenterToOtherTangentPoint(Vector2 linePoint, Vector2 lineDirection, Vector2 tangentPoint1, out Vector2 returnMirrorPoint)
        Vector2 mirrorPoint = FindNearestPointOnLine(linePoint, lineDirection, tangentPoint1);
        returnMirrorPoint = mirrorPoint;
        Vector2 tangentPointToMirrorPoint = mirrorPoint - tangentPoint1;
        return mirrorPoint + tangentPointToMirrorPoint;

 public static Vector2 FindNearestPointOnLine(Vector2 origin, Vector2 direction, Vector2 point)
        Vector2 lhs = point - origin;

        //t = how far to move from the origin along the direction to reach the closest point
        float t = Vector2.Dot(lhs, direction);
        return origin + direction * t;

My Problem

I need to be able to scale this approach to at the very minmum 100 units (but I would very much prefer 200 units), which currently is not possible, because the algorithm/implementation is too slow.

So I am wondering if somebody can think of an improvement to this approach. Either mathematically(i.e. some better method/shortcut to calculate the angles) or other tips how to improve my code.

Get this bounty!!!

#StackBounty: #algorithm #lighting #mathematics #importance-sampling Pre-filtered environment map, deriving the equation

Bounty: 50

I’m reading through this article, and more specifically I’m trying to derive the equation that would explain the implementation the following shader (still in the same article):

#version 330 core
out vec4 FragColor;
in vec3 localPos;

uniform samplerCube environmentMap;
uniform float roughness;

const float PI = 3.14159265359;

float RadicalInverse_VdC(uint bits);
vec2 Hammersley(uint i, uint N);
vec3 ImportanceSampleGGX(vec2 Xi, vec3 N, float roughness);

void main()
    vec3 N = normalize(localPos);    
    vec3 R = N;
    vec3 V = R;

    const uint SAMPLE_COUNT = 1024u;
    float totalWeight = 0.0;   
    vec3 prefilteredColor = vec3(0.0);     
    for(uint i = 0u; i < SAMPLE_COUNT; ++i)
        vec2 Xi = Hammersley(i, SAMPLE_COUNT);
        vec3 H  = ImportanceSampleGGX(Xi, N, roughness);
        vec3 L  = normalize(2.0 * dot(V, H) * H - V);

        float NdotL = max(dot(N, L), 0.0);
        if(NdotL > 0.0)
            prefilteredColor += texture(environmentMap, L).rgb * NdotL;
            totalWeight      += NdotL;
    prefilteredColor = prefilteredColor / totalWeight;

    FragColor = vec4(prefilteredColor, 1.0);

Which is my understanding suppose to implement the computation of the integral

I = int_{Omega} L(omega_i)domega_i

However I’m not entirely sure since NdotL is also used in the computation so I might be wrong.

It seems to me the idea is given $N$ to sample a random halfway vector $H$, and based on this we can construct, by reflection, the light direction vector $L$, which would be random as well by construction. The article also mentions that $H$ is sampled from a normal distribution.

But anyway suppose I have samples $H^{1} ,ldots, H^{N}$ and therefore $L^{1},ldots, L^{N}$ samples, the latter I suppose would define $omega_{i}^{1} ldots, omega_{i}^{N}$ (solid angles).

By importance sampling

I approx frac{1}{N} sum_{j=1}^{N} frac{L(omega_i^{j})}{pdf(omega_i^{j})} ;;;; (1)

If what I wrote is correct than I don’t know where the dot product and the normalization factor

W = sum_{j=1}^{N} Ncdot L^{j}

comes from, namely how do I go from $(1)$ to

I approx
frac{sum_{j=1}^{N}L(omega_i^{j}) N cdot L^{j}}
{sum_{j=1}^{N} N cdot L^{j}}

Get this bounty!!!

#StackBounty: #unity #collision-detection #physics #mathematics #grid Redundant checks from spatial partitioning

Bounty: 100

A simple 2D grid partition system for collision detection divides space into equal-sized tiles. Colliders then map themselves to tiles with their bounding box in O(N) time. Collision detection only has to be performed between objects that occupy the same tile(s).

Collision detection is expensive so the more culled, the better. With a grid partitioning system, how are collisions culled when colliders share multiple tiles?

For example, Circle_A and Circle_B both sit on the edge between Tile_1 and Tile_2. Iterating both tiles would make collision detection between Circle_A & Circle_B run twice.
Sharing Multiple Tiles

A global pair hashtable seems performance expensive since it has to be cleared every frame. Tracking pairs inside colliders takes a lot of memory. How do you mitigate redundant pairs when iterating through partition grid tiles?

Get this bounty!!!

#HackerRank: Computing the Correlation


You are given the scores of N students in three different subjects – MathematicsPhysics and Chemistry; all of which have been graded on a scale of 0 to 100. Your task is to compute the Pearson product-moment correlation coefficient between the scores of different pairs of subjects (Mathematics and Physics, Physics and Chemistry, Mathematics and Chemistry) based on this data. This data is based on the records of the CBSE K-12 Examination – a national school leaving examination in India, for the year 2013.

Pearson product-moment correlation coefficient

This is a measure of linear correlation described well on this Wikipedia page. The formula, in brief, is given by:

where x and y denote the two vectors between which the correlation is to be measured.

Input Format

The first row contains an integer N.
This is followed by N rows containing three tab-space (‘\t’) separated integers, M P C corresponding to a candidate’s scores in Mathematics, Physics and Chemistry respectively.
Each row corresponds to the scores attained by a unique candidate in these three subjects.

Input Constraints

1 <= N <= 5 x 105
0 <= M, P, C <= 100

Output Format

The output should contain three lines, with correlation coefficients computed
and rounded off correct to exactly 2 decimal places.
The first line should contain the correlation coefficient between Mathematics and Physics scores.
The second line should contain the correlation coefficient between Physics and Chemistry scores.
The third line should contain the correlation coefficient between Chemistry and Mathematics scores.

So, your output should look like this (these values are only for explanatory purposes):


Test Cases

There is one sample test case with scores obtained in Mathematics, Physics and Chemistry by 20 students. The hidden test case contains the scores obtained by all the candidates who appeared for the examination and took all three tests (Mathematics, Physics and Chemistry).
Think: How can you efficiently compute the correlation coefficients within the given time constraints, while handling the scores of nearly 400k students?

Sample Input

73  72  76
48  67  76
95  92  95
95  95  96
33  59  79
47  58  74
98  95  97
91  94  97
95  84  90
93  83  90
70  70  78
85  79  91
33  67  76
47  73  90
95  87  95
84  86  95
43  63  75
95  92  100
54  80  87
72  76  90

Sample Output


There is no special library support available for this challenge.



What is the difference between linear regression on y with x and x with y?

The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but that’s the case.

The best way to think about this is to imagine a scatter plot of points with y on the vertical axis and x represented by the horizontal axis. Given this framework, you see a cloud of points, which may be vaguely circular, or may be elongated into an ellipse. What you are trying to do in regression is find what might be called the ‘line of best fit’. However, while this seems straightforward, we need to figure out what we mean by ‘best’, and that means we must define what it would be for a line to be good, or for one line to be better than another, etc. Specifically, we must stipulate a loss function. A loss function gives us a way to say how ‘bad’ something is, and thus, when we minimize that, we make our line as ‘good’ as possible, or find the ‘best’ line.

Traditionally, when we conduct a regression analysis, we find estimates of the slope and intercept so as to minimize the sum of squared errors. These are defined as follows:

In terms of our scatter plot, this means we are minimizing the sum of the vertical distances between the observed data points and the line.

enter image description here

On the other hand, it is perfectly reasonable to regress x onto y, but in that case, we would put x on the vertical axis, and so on. If we kept our plot as is (with x on the horizontal axis), regressing x onto y (again, using a slightly adapted version of the above equation with x and y switched) means that we would be minimizing the sum of the horizontal distances between the observed data points and the line. This sounds very similar, but is not quite the same thing. (The way to recognize this is to do it both ways, and then algebraically convert one set of parameter estimates into the terms of the other. Comparing the first model with the rearranged version of the second model, it becomes easy to see that they are not the same.)

enter image description here

Note that neither way would produce the same line we would intuitively draw if someone handed us a piece of graph paper with points plotted on it. In that case, we would draw a line straight through the center, but minimizing the vertical distance yields a line that is slightly flatter (i.e., with a shallower slope), whereas minimizing the horizontal distance yields a line that is slightly steeper.

A correlation is symmetrical x is as correlated with y as y is with x. The Pearson product-moment correlation can be understood within a regression context, however. The correlation coefficient, r, is the slope of the regression line when both variables have been standardized first. That is, you first subtracted off the mean from each observation, and then divided the differences by the standard deviation. The cloud of data points will now be centered on the origin, and the slope would be the same whether you regressed y onto x, or x onto y.

enter image description here

Now, why does this matter? Using our traditional loss function, we are saying that all of the error is in only one of the variables (viz., y). That is, we are saying that x is measured without error and constitutes the set of values we care about, but that y has sampling error. This is very different from saying the converse. This was important in an interesting historical episode: In the late 70’s and early 80’s in the US, the case was made that there was discrimination against women in the workplace, and this was backed up with regression analyses showing that women with equal backgrounds (e.g., qualifications, experience, etc.) were paid, on average, less than men. Critics (or just people who were extra thorough) reasoned that if this was true, women who were paid equally with men would have to be more highly qualified, but when this was checked, it was found that although the results were ‘significant’ when assessed the one way, they were not ‘significant’ when checked the other way, which threw everyone involved into a tizzy. See here for a famous paper that tried to clear the issue up.

Here’s another way to think about this that approaches the topic through the formulas instead of visually:

The formula for the slope of a simple regression line is a consequence of the loss function that has been adopted. If you are using the standard Ordinary Least Squares loss function (noted above), you can derive the formula for the slope that you see in every intro textbook. This formula can be presented in various forms; one of which I call the ‘intuitive’ formula for the slope. Consider this form for both the situation where you are regressing y on x, and where you are regressing x on y:

Now, I hope it’s obvious that these would not be the same unless Var(xequals Var(y). If the variances are equal (e.g., because you standardized the variables first), then so are the standard deviations, and thus the variances would both also equal SD(x)SD(y). In this case, β^1 would equal Pearson’s r, which is the same either way by virtue of the principle of commutativity: