#StackBounty: #machine-learning #time-series #predictive-models #prediction-interval Predictive maintenace model to identify indication…

Bounty: 50

Situation

I’m working on a problem where I’m using sensor data to predict machine failure before the failure happens and I need some advice on which methods to explore.

Specifically, I want to identify indications of impending failure prior to the failure actually happening. Ideally this would be with enough lead time that we could fix whatever happened before it causes failure.

Problem

The conceptual road block that I’m at is that I know that I could fit various classification models (logistic regression, decision tree, nearest neighbor, etc.) to the data to identify the probability of failure given specific parameters at that time. However, I can’t figure out how to identify the indication of an upcoming failure with enough time to actually do something about it.

Possible Approaches

I am familiar with Survival Analysis, but given that I don’t have data from multiple machines, and it’s not as though after a repair the machine is back to 100%, I don’t feel like that is a good fit necessarily.

I’ve also thought about taking the time that a failure happens, shifting it back 1 hour, and seeing how accurate I can predict that point. If I’m able to, move the target back another hour and see how much lead time I can confidently predict. But I’m not sure if it’s appropriate to do this.

Available Data

The data that I have is recorded from one machine over a 1 year period. There are approximately 60 sensors that are recorded every two minutes. These sensors measures variables such as the temperatures of different components that make up the machine (including thermostat setting vs actual temp), the speed that the machine is running at, steam pressures throughout the machine, fan speeds, whether or not the machine is running, etc.

In addition to the sensor readings, I have enriched the data set to also include the reason that the machine is not running (e.g.: shift change, preventative maintenance, failure). I’ve included a condensed example of what the data looks like at the bottom of this post. I’ve altered the example to capture some of the variety captured in the whole dataset. In reality, when the machine stops running, it’s down for anywhere from 2 minutes to 2 days, depending on the reason. Also, the variables don’t necessarily change quite as rapidly as seen in the example below, but I wanted to provide some variety.

+-----------------+----------+-------------+------------+------------+-------+-------+-----+--------------------------+------------+
|    Datetime     | CircFan  | CircFanAct  | EntrySpeed | ExhaustFan | Speed | Temp1 | Run |          Reason          | TimeBtwRun |
+-----------------+----------+-------------+------------+------------+-------+-------+-----+--------------------------+------------+
| 2009-10-19 0:00 |      100 |         600 |        461 |         40 |    45 |  1126 |   1 |                          | NA         |
| 2009-10-19 0:02 |      100 |         600 |          0 |         39 |    45 |  1120 |   0 | shift change             | 0:00       |
| 2009-10-19 0:04 |      100 |         600 |          0 |         39 |    45 |  1118 |   0 | shift change             | 0:02       |
| 2009-10-19 0:06 |       95 |         600 |        461 |         39 |    45 |  1119 |   1 |                          | 0:00       |
| 2009-10-19 0:08 |       95 |         599 |        461 |         40 |    45 |  1120 |   1 |                          | 0:02       |
| 2009-10-19 0:10 |       95 |         598 |        461 |         40 |    45 |  1120 |   1 |                          | 0:04       |
| 2009-10-19 0:12 |       95 |         597 |        461 |         40 |    45 |  1130 |   1 |                          | 0:06       |
| 2009-10-19 0:14 |      100 |         597 |          0 |         40 |    45 |   699 |   0 | failure                  | 0:00       |
| 2009-10-19 0:16 |      100 |         597 |          0 |         40 |    45 |   659 |   0 | failure                  | 0:02       |
| 2009-10-19 0:18 |      100 |         597 |          0 |         40 |    45 |   640 |   0 | failure                  | 0:04       |
| 2009-10-19 0:20 |      100 |         600 |        461 |         40 |    45 |  1145 |   1 |                          | 0:00       |
| 2009-10-19 0:22 |      100 |         600 |        461 |         40 |    45 |  1144 |   1 |                          | 0:02       |
| 2009-10-19 0:24 |       80 |         600 |        461 |         40 |    45 |  1138 |   1 |                          | 0:04       |
| 2009-10-19 0:26 |       80 |         600 |        461 |         41 |    45 |  1133 |   1 |                          | 0:06       |
| 2009-10-19 0:28 |       80 |         600 |        461 |         41 |    45 |  1134 |   1 |                          | 0:08       |
| 2009-10-19 0:30 |      100 |         600 |        461 |         41 |    45 |  1134 |   1 |                          | 0:10       |
| 2009-10-19 0:31 |      100 |         600 |        461 |         41 |    45 |  1133 |   1 |                          | 0:11       |
| 2009-10-19 0:34 |      100 |         600 |        461 |         40 |    45 |  1140 |   1 |                          | 0:13       |
| 2009-10-19 0:36 |      100 |         600 |        100 |         40 |    45 |   788 |   0 | preventative maintenance | 0:00       |
| 2009-10-19 0:38 |      100 |         600 |        100 |         40 |    45 |   769 |   0 | preventative maintenance | 0:02       |
+-----------------+----------+-------------+------------+------------+-------+-------+-----+--------------------------+------------+


Get this bounty!!!

Leave a Reply