#StackBounty: #machine-learning #neural-network #convnet #rnn Recurrent (CNN) model on EEG data

Bounty: 50

I’m wondering how to interpret a recurrent architecture in an EEG context. Specifically I’m thinking of this as a Recurrent CNN (as opposed to architectures like LSTM), but maybe it applies to other types of recurrent networks as well

When I read about R-CNNs, they’re usually explained in image classification contexts. Theyre typically described as “learning over time” or “including the effect of time-1 on the current input”

This interpretation/explanation gets really confusing when working with EEG data. An example of an R-CNN being used on EEG data can be found here

Imagine I have training examples each consisting of a 1×512 array. This array captures a voltage reading for 1 electrode at 512 consecutive time points. If I use this as input to a Recurrent CNN (using 1D convolutions), the recurrent part of the model isn’t actually capturing “time”, right? (as would be implied by the descriptions/explanations discussed earlier) Because in this context time is already captured by the second dimension of the array

So with a setup like this, what does the recurrent part of the network actually allow us to model that a regular CNN can’t (if not time)?

It seems to me that recurrent just means doing a convolution, adding the result to the original input, and convolving again. This gets repeated for x number of recurrent steps. What advantage does this process actually give?


Get this bounty!!!

#StackBounty: #machine-learning #distributions #dataset #normalization #standardization Data Distribution and Feature Scaling Techniques

Bounty: 50

New to AI/ML. My understanding of feature scaling is that its a set of techniques used to counteract the effects of different features having different scales/ranges (which then causes models to incorrectly weight them more/less).

The two most common techniques here that I keep reading about are normalization (adjusting your feature values between 0 and 1) and standardization (adjusting your feature values to have a 0 mean and standard deviation of 1).

From what I can gather, normalization seems to work better for when your data is non-Gaussian/”Bell Curve”, whereas standardization is better when it is Gaussian. But nowhere can I find a decent explanation as to why this is the case!

Why does your data distribution affect the efficacy of your feature scaling technique? Why is normalization good for non-Gaussian whereas standardization is? Any edge cases where you’d use standardization on non-Gaussian data? Any other major techniques besides these two?

For instance, I found this excellent paper on characterizing datasets by various distributions. So I’m wondering if there are methods for feature scaling when the data is, say, geometrically distributed, or when its exponentially distributed, etc. And if so, what are they?!


Get this bounty!!!

#StackBounty: #machine-learning #time-series #predictive-models #prediction-interval Predictive maintenace model to identify indication…

Bounty: 50

Situation

I’m working on a problem where I’m using sensor data to predict machine failure before the failure happens and I need some advice on which methods to explore.

Specifically, I want to identify indications of impending failure prior to the failure actually happening. Ideally this would be with enough lead time that we could fix whatever happened before it causes failure.

Problem

The conceptual road block that I’m at is that I know that I could fit various classification models (logistic regression, decision tree, nearest neighbor, etc.) to the data to identify the probability of failure given specific parameters at that time. However, I can’t figure out how to identify the indication of an upcoming failure with enough time to actually do something about it.

Possible Approaches

I am familiar with Survival Analysis, but given that I don’t have data from multiple machines, and it’s not as though after a repair the machine is back to 100%, I don’t feel like that is a good fit necessarily.

I’ve also thought about taking the time that a failure happens, shifting it back 1 hour, and seeing how accurate I can predict that point. If I’m able to, move the target back another hour and see how much lead time I can confidently predict. But I’m not sure if it’s appropriate to do this.

Available Data

The data that I have is recorded from one machine over a 1 year period. There are approximately 60 sensors that are recorded every two minutes. These sensors measures variables such as the temperatures of different components that make up the machine (including thermostat setting vs actual temp), the speed that the machine is running at, steam pressures throughout the machine, fan speeds, whether or not the machine is running, etc.

In addition to the sensor readings, I have enriched the data set to also include the reason that the machine is not running (e.g.: shift change, preventative maintenance, failure). I’ve included a condensed example of what the data looks like at the bottom of this post. I’ve altered the example to capture some of the variety captured in the whole dataset. In reality, when the machine stops running, it’s down for anywhere from 2 minutes to 2 days, depending on the reason. Also, the variables don’t necessarily change quite as rapidly as seen in the example below, but I wanted to provide some variety.

+-----------------+----------+-------------+------------+------------+-------+-------+-----+--------------------------+------------+
|    Datetime     | CircFan  | CircFanAct  | EntrySpeed | ExhaustFan | Speed | Temp1 | Run |          Reason          | TimeBtwRun |
+-----------------+----------+-------------+------------+------------+-------+-------+-----+--------------------------+------------+
| 2009-10-19 0:00 |      100 |         600 |        461 |         40 |    45 |  1126 |   1 |                          | NA         |
| 2009-10-19 0:02 |      100 |         600 |          0 |         39 |    45 |  1120 |   0 | shift change             | 0:00       |
| 2009-10-19 0:04 |      100 |         600 |          0 |         39 |    45 |  1118 |   0 | shift change             | 0:02       |
| 2009-10-19 0:06 |       95 |         600 |        461 |         39 |    45 |  1119 |   1 |                          | 0:00       |
| 2009-10-19 0:08 |       95 |         599 |        461 |         40 |    45 |  1120 |   1 |                          | 0:02       |
| 2009-10-19 0:10 |       95 |         598 |        461 |         40 |    45 |  1120 |   1 |                          | 0:04       |
| 2009-10-19 0:12 |       95 |         597 |        461 |         40 |    45 |  1130 |   1 |                          | 0:06       |
| 2009-10-19 0:14 |      100 |         597 |          0 |         40 |    45 |   699 |   0 | failure                  | 0:00       |
| 2009-10-19 0:16 |      100 |         597 |          0 |         40 |    45 |   659 |   0 | failure                  | 0:02       |
| 2009-10-19 0:18 |      100 |         597 |          0 |         40 |    45 |   640 |   0 | failure                  | 0:04       |
| 2009-10-19 0:20 |      100 |         600 |        461 |         40 |    45 |  1145 |   1 |                          | 0:00       |
| 2009-10-19 0:22 |      100 |         600 |        461 |         40 |    45 |  1144 |   1 |                          | 0:02       |
| 2009-10-19 0:24 |       80 |         600 |        461 |         40 |    45 |  1138 |   1 |                          | 0:04       |
| 2009-10-19 0:26 |       80 |         600 |        461 |         41 |    45 |  1133 |   1 |                          | 0:06       |
| 2009-10-19 0:28 |       80 |         600 |        461 |         41 |    45 |  1134 |   1 |                          | 0:08       |
| 2009-10-19 0:30 |      100 |         600 |        461 |         41 |    45 |  1134 |   1 |                          | 0:10       |
| 2009-10-19 0:31 |      100 |         600 |        461 |         41 |    45 |  1133 |   1 |                          | 0:11       |
| 2009-10-19 0:34 |      100 |         600 |        461 |         40 |    45 |  1140 |   1 |                          | 0:13       |
| 2009-10-19 0:36 |      100 |         600 |        100 |         40 |    45 |   788 |   0 | preventative maintenance | 0:00       |
| 2009-10-19 0:38 |      100 |         600 |        100 |         40 |    45 |   769 |   0 | preventative maintenance | 0:02       |
+-----------------+----------+-------------+------------+------------+-------+-------+-----+--------------------------+------------+


Get this bounty!!!

#StackBounty: What are Regularities and Regularization?

Bounty: 50

I am hearing these words more and more as I study machine learning. In fact, some people have won Fields medal working on regularities of equations. So, I guess this is a term that carries itself from statistical physics/maths to machine learning. Naturally, a number of people I asked just couldn’t intuitively explain it.

I know that methods such as dropout help in regularization (=> they say it reduces overfitting, but I really don’t get what it is: if it only reduces overfitting, why not just call it anti-overfitting methods => there must be something more I think, hence this question).

I would be really grateful (I guess the naive ML community would be too!) if you could explain:

  1. How do you define regularity? What is regularity?

  2. Is regularization a way to ensure regularity? i.e. capturing regularities?

  3. Why do ensembling methods like dropout, normalization methods all claim to be doing regularization?

  4. Why do these (regularity/regularization) come up in machine learning?

Thanks a lot for your help.


Get this bounty!!!