#StackBounty: #machine-learning #neural-networks #reinforcement-learning Can machine learning be used to determine how/when to scan for…

Bounty: 100


I want to know if this problem can/should be solved using machine learning? If I can and I should, what resources or next steps should I take ? Which approach seems to fit ?


There are 500,000 buildings, and at any given moment there could be anywhere from 0 to 10 people in a building. There is a population of 10,000,000 people. The people can come and go from the buildings as they please, but certain people will typically be at certain buildings at certain times (certain hour ranges, certain days of the week). A given person can be in 0 or 1 building at any given time, where ~95% of the time they will not be in a building.

There exists a mechanism which can identify all the people in a building and log the occupants at that moment. The log history can be examined, and given two snapshots in time, it can be identified which people have left and which have entered (and likewise remained). The change in the number of people (left+entered) from the previous snapshot shall be called the churn. The act of searching a building costs time


Maximize the rate of identifying the buildings of as much of the population as possible.


Once I get a good model with this limited information I would like to include more parameters/variables such as a buildings public rating. What language is spoken at the building, etc.


I believe re-enforcement learning may be applicable here because the churn is the reward and the search is the punishment. Imagine a building that almost never has people in it, scanning the building and finding no difference (0 people before, 0 people after) is costly. Scanning a popular building (always 10 people) but the occupants never change is almost as bad as scanning a building that typically never has people because the amount of population covered is either 0 or 10. However, scanning a popular building with high churn will yield more information about the whereabouts of more of the population through time. Just as important is the frequency of scanning the same building, because if you scan a building over and over within a minute, not much churn will have occurred.

If I was to solve this programmatically, perhaps I would assign a scanning cool down value to each building. Each buildings cool down would decrease through time and once it hit 0 or below, it would be suitable for re-scan. Re-scanning would reset it’s cool down to its “delta cooldown”. I would need to know some good min/max/avg cool down delta, such that buildings that had high churn would approach the minimum cooldown delta (so they are scanned more frequently) and buildings with little churn would approach the maximum cooldown (scanned less often, but probably not never again). I don’t feel like this is the best approach and it’s not exactly easy. I could probably think about this longer and come up with something, but in reality I want an excuse to learn more about machine learning.


I think approach would focus on the buildings with the highest churn and not have much new “discovery”. This is especially true for my programmatic approach. Consider a building that gets scanned initially 10 times and every time had no churn. Over the course of time it may get popular (or have some sort of seasonality to it) that it never gets picked up because the algorithm is too busy focusing on previously scanned high churn rooms. I think this concern might be alleviated just by having a programmatic/random portion which adds data back in some % of the time, or alternatively give the machine learning a short memory time span.


I just spent all this time typing this up and I realized this is basically a search crawling problem, figuring out which sites to crawl and how often. Hopefully that will direct me in numerous approaches that are simple enough (I just need something basic that will probably be better than a programmatic approach)

Get this bounty!!!

#StackBounty: #machine-learning #time-series #neural-networks #data-leakage data leakage when scaling time series

Bounty: 150

Suppose I want to forecast future values of $y$ past values of features $x$.
In this example I am using:

  • the training set goes from $t_0$ to $t_{15}$
  • values from $x_{t_0}$ to $x_{t_{10}}$ to forecast $y_{t_{11}}$
  • values from $x_{t_1}$ to $x_{t_{11}}$ to forecast $y_{t_{12}}$
  • and so on until I use $x_{t_6}$ to $x_{t_{15}}$ to forecast $y_{t_{16}}$

I scale the my feature $x$ using only data in the training set (up to $t_{15}$)

Nevertheless when I try to predict $y_{t_{17}}$ you can see from the picture below that I use some data points that have also been used for scaling.

Is this leakage?

enter image description here

Get this bounty!!!

#StackBounty: #python #machine-learning #neural-networks #anomaly-detection Machine Learning to detect or classify control chart of sen…

Bounty: 50

I am working on building a machine learning model to detect the drift in trend, whether upward or downward trends (see the figure attached). The idea to send alarms when the uptick or down-tick happens and the data reaches a control limit, where the equipment results fatal abort.

I have been looking different methods, but not been successful, for example I tried LSTM anomaly detection, it works if I have already know when the measurements starts drifting, then I can take the earlier data for training and later data for testing. In my current situation, I wouldnt know when the drift happens to split data for training and testing. Moreover, the model should look back for last 2 or 3 days data and see if the trend drifting upward or downward.

Below is the picture with hypothetical data to convey the idea. I greatly appreciate if you point me in the right direction or please share your wisdom on how to go about it.

enter image description here

Thank you

Get this bounty!!!

#StackBounty: #neural-networks #natural-language #word-embeddings Downweight or partially mask certain inputs to Neural Network

Bounty: 50

I have an NLP classification task for sentences set up, in which the goal is to predict a sentence label that depends on the primary verb used in the sentence. This task can be solved by just memorizing the verb – label association, but I would like to regularize or “encourage” the model to use information from the surrounding context as well, so that the model can generalize well to unseen verbs. Fully masking the embedding corresponding to the verb results in an underspecified task, since information from the verb is needed to fully determine the label.

In short, I’d like a way to partially mask or downweight a specific input embedding to a neural network classifier, to encourage the network to use information from the surrounding context in addition to that input. I’ve thought about rescaling the verb embedding by a constant $c < 1$, but then $c$ becomes a hyperparameter that I would have to set somewhat arbitrarily.
Any suggestions or pointers to references would be greatly appreciated. Thanks!

Get this bounty!!!

#StackBounty: #neural-networks #natural-language Why BERT keep some masked tokens unchanged?

Bounty: 50

As I understand, out of all masked tokens in BERT

  1. Replace some with [mask], this is because of MLM
  2. Replace some with other token, this will force model to generate proper contextual embedding for all tokens in the sequence, not only the [mask] ones. This is consistent with the goal of finetuning.

But I don’t understand why BERT keep some masked tokens unchanged, could anyone please help me to understand it?

Get this bounty!!!

#StackBounty: #neural-networks #conv-neural-network #image-processing #research-design Real noise modeling/ noise map generation (image…

Bounty: 50

I am working on a project with really noisy images. I have trained a detector that can detect the characters but fails in some cases (noise is high).

So far I have gone through many denoising, deblurring, super-resolution papers. The problem with denoising papers is that in almost all of them, they use a specified Gaussian noise to first add noise and trains the model on that. I have tried it but it doesn’t work very well in my domain as the source of the noise in my images is different.

Let’s say I have few thousand images (real-data with noise), is there any deep learning/image processing approach which helps me to get a noise map which I’ll use to augment my clean images so that I can train denoising models.

Get this bounty!!!

#StackBounty: #neural-networks #conv-neural-network #convolution Temporal Convolutional Networks (TCNs): Possibility to provide general…

Bounty: 50

In my task it is important to provide general information for each sample.
A sample consists of a time sequence and there is a channel with n values for each time t of the sequence. This results in a shape of (num_samples, num_t_in_timesequence, channels).

For a sample I would like to give general information that remains the same for all t time steps of the sample.
With LSTMs, for example, this is possible by initializing the hidden or cell state not with zero, but with the start information.

Is there a similar possibility with TCNs? And if so, where would I have to do that?

Get this bounty!!!

#StackBounty: #machine-learning #neural-networks #supervised-learning #training-error Is there a clear relationship between number of t…

Bounty: 50

It seems that without knowing the model complexity, it is difficult to state for certain what is the relationship between the number of training examples and over/underfitting.

As a concrete example, suppose that I have some unspecified class of model. I have 1000 data at my disposal. Suppose we partition the data into N training examples and train a classifier based on these N points.

Now supposed that N is small (e.g., 200) will I have overfitting or underfitting? Similarly, suppose N is large (e.g., 800), what is the answer to the above question?

It seems logically plausible that both might occur.

Can someone chime in and come up with some example where one or the other might occur (or both)?

Get this bounty!!!

#StackBounty: #machine-learning #neural-networks How to embed the preceding knowledge in the training?

Bounty: 50

I am trying to train a neural network that takes X0 and X1 as input, predict A as output. Here is part of training set.

            X0          X1     A     B     C
1   850.111306  112.756018  16.0  10.3  0.00
2   858.766536  105.599268  11.6  15.1 -0.08
3   811.235566  168.255090  12.2  11.3  0.31

B is the target value in the dataset for training.

C is the preceding knowledge, some kind of measurement of B, whose absolute value is closer to 0, the better. For instance, the C of first data record $C_1 = 0.00$ is better than $C_2 = -0.08$, and then $C_2$ is better than $C_3$

The negative and positive sign of C denote the direction, negative means less than, positive mean greater than. For instance, ideally the model should predict first row as 10.3; 2nd row as some value less than 15.1, how much smaller? It is another topic; 3rd row as some value greater than 11.3.

The question is:

How to put the preceding knowledge C in the training?

Get this bounty!!!