## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: #bayesian #algorithms #prior How to extend Solomonoff's universal prior to stochastic models?

### Bounty: 150

Solomonoff’s universal prior for models is based on the algorithmic complexity of a computer program \$p\$ which executes that model. Where \$l\$ is the length of the computer program, the prior is proportional to \$2^{-l}\$.

This works fine for deterministic models. We have an observation and we want to understand what model best explains the observation. If the program \$p\$ returns an output equal to the observation then this program is valid, otherwise it is invalid. We can apply Bayes rule to get a posterior probability by considering all programs which returned the desired output.

I’m confused about how this is extended to the case where there is randomness in the observations we collect.

Take a simple example: Suppose we are considering a linear regression model \$y=ax+b\$. The computer program which executes this model has a length \$l\$ which is a function of \$a\$ and \$b\$ (bigger parameters require more bits to model). The prior is \$pi(a,b)\$.

But this program doesn’t include a random element. The regression model is \$y=ax+b+epsilon\$ where \$epsilon\$ is random noise with a distribution \$N(0,sigma^2)\$

How should we consider a prior for the noise? A computer program can’t produce a random noise so this can’t be a part of the program length. Should the prior for \$sigma^2\$ be considered separately to the prior for the model? Or should the number of bits required to encode the numeric value \$sigma^2\$ be included in the length of the program?

In addition to the variance of the noise, the underlying normal distribution would seen to have some degree of complexity associated with it. Should the bits required to describe a normal distribution be included in the model prior?

Get this bounty!!!

## #StackBounty: How to apply ant colony optimization to the TSP but repeating nodes and edges

### Bounty: 50

I’m learning the Ant Colony Optimization Algorithm and I would like to apply it to a variation of the TSP problem (find the path that start from a node, crosses all nodes and finish in the initial node) where you can cross a node or edge more than once. Actually, my problem wouldn’t have solution for the TSP because some nodes only have one edge.

Basically, I see two problems:

1. In most of examples I’ve seen, nodes are represented by (X,Y) positions and they suppose that is a complete graph, so from every node you can go to every node, which involves that an ant will always have a edge to go to a node that hasn’t been visited yet. But in this problem, it may happen that an ant arrives to a node where al the adjacents nodes have already been visited, so, how would it decide the way to go? The worst case would be that there is only one node left to visit and it is the one farthest from the current node.

2. What repeating edges involves to pheromones. If and edge is crossed several times in the tour, it could or could not have much more pheromones, depending on the implementation, and I’m not sure if it could have a negative impact.

This is a very simple example, where red edges mean that the cost is 1 and blue that is very big, for example 20. As you can see for the best path you need to repeat edges.

Regards.

EDIT: I could use Floyd to get the shortest paths between each node, but the real graph has around 1000 nodes and 2500 edges (sparse graph), and Floyd is O(N^3) so I’m not sure if it is a reasonable option (I’ve seen Johnson too). But in that case I would have to consider that the path from one node to another may cross other vistied or not visited nodes, which changes the logic of the ant colony optimization, and it can have problem with data structures, because if I store all the possible paths one option would be to have a matrix of 1000×1000 (it could be optimized to 1000×500) and in each value of the matrix a list of more than 1000 nodes, but that way the memory gets to big (1000x1000x1000).

Get this bounty!!!