#StackBounty: #algorithms #data-structures #time-complexity Skip List estimate number of elements less than (or greater than) a value

Bounty: 50

I am looking for a thread safe data structure that can find an element in O(log n) and for which for any value, x, the number of elements greater than (or less than) can be estimated also in O(log n). Can a skip list or some modification of it be used? Is there an alternative data structures (Parallel Red-Black Tree for example?) that can be used?


Get this bounty!!!

#StackBounty: #algorithms #search-algorithms #binary-search Feeding real-time data and binary search algorithm termination

Bounty: 50

This question was asked in our exam long a go and I don’t remember exact words.

The scenario was,
Initially you are given a set of finite data to start with, and a key value (which you have to find in a given set). Can we terminate binary search algorithm/program, if we grow the size of the data by feeding new finite data in different quantum during the search is going on.

e.g. if I am searching name of a person in a directory, and there is a on-going survey of the new connections by telecom dept., which they are transmitting in database from remote location continuously (without interruption) and those newly found data are fed in binary-search in real time.

Can we say that binary search will terminate for the key value in this scenario before new data is fed?, if yes then how to prove it?

Sorry, if I am not clear enough. I will try to explain further if required.


Get this bounty!!!

#StackBounty: #machine-learning #classification #scikit-learn #algorithms Algorithm to move events of MC from one class to other to mat…

Bounty: 50

I have MC and data each having events in two classes 0 and 1. I am trying to write an algorithm such that I can match the number of events in class 0 and 1 of MC to data i.e I want to correct MC events by moving them from one class to other such that the ratio of events in the two classes for both data and MC is same. The way I proceeded is:

  1. Train a GradientBoostingClassifier from scikit ensemble for both data and MC individually(say data_clf and mc_clf)
     mc_clf.fit(X_mc, Y_mc)
     data_clf.fit(X_data , Y_data)
    

where Y_mc and Y_data is the corresponding class “mc_class” and “data_class” having values 0 or 1 depending on which class they belong to.

  1. Now, if X_mc is my input variable, use predict_proba to predict the probability of classifier of data and MC using MC inputs ONLY i.e
     y_mc = mc_clf.predict_proba(X_mc)
     y_data = data_clf.predict_proba(X_mc)
    
  2. After this, I try to move the events of MC from one class to another by comparing their probability in data and MC.
     for i in range(0, len(mc)):
         if (mc.loc[i]['mc_class'] == 0): 
             wgt = y_data[i][0]/ y_mc[i][0]
             if (wgt<1): mc.loc[i]['mc_class_corrected'] = 1
             else: mc.loc[i]['mc_class_corrected'] = mc.loc[i]['mc_class'] 
    
    
         if (mc.loc[i]['mc_class'] == 1): 
             wgt = y_data[i][1]/ y_mc[i][1]
             if (wgt<1) : mc.loc[i]['mc_class_corrected'] = 0
             else: mc.loc[i]['mc_class_corrected'] = mc.loc[i]['mc_class'] 
    

In the end what happens is that initially suppose I had more events in class 0 than 1 in MC as compared to data. So I expect events from class 0 to move to class 1. However, I see that almost >95% of my events in class 0 of MC are moving to class 1 while I was expecting only about 30% of events to move (when compared to the number of events in data and MC)?
Is there any mistake in this ideology of working?

Thanks a lot:)


Get this bounty!!!

#StackBounty: #algorithms #sorting #strings #binary Partitioning through block moves to the end

Bounty: 50

Suppose we have a binary string $s$. We wish to partition this string in a series of $0$s followed by $1$s (alternatively: we wish to sort), using only one operation: moving three consecutive elements to the end.

E.g. if our string is $ABCDEFGH$ we can do our operation at index $1$ to get $DEFGHABC$. If we did it at index $4$ we’d get $ABCGHDEF$.

I’m interested in optimal solutions – that is with the least amount of moves. I have a working algorithm using IDA* with heuristic “# of groups of ones with length $leq 3$ before the final zero”, with the logic that each such a group requires at least one move to fix. An example optimal solution (where ^ indicates where a block was chosen to move to the end):

1101101011101011010010010011                               
                       ^                                   
1101101011101011010010011100                               
                     ^                                     
1101101011101011010011100001                               
              ^                                            
1101101011101010011100001110                               
           ^                                               
1101101011110011100001110010                               
                        ^                                  
1101101011110011100001110001                               
     ^                                                     
1101111110011100001110001010                               
  ^                                                        
1111110011100001110001010011                               
                     ^                                     
1111110011100001110000011101                               
                       ^                                   
1111110011100001110000001111                               
               ^                                           
1111110011100000000001111111                               
        ^                                                  
1111110000000000001111111111                               
   ^                                                       
1110000000000001111111111111                               
^                                                          
0000000000001111111111111111 

However, this algorithm is exponential and not feasible for larger strings. After studying quite some optimal solutions, especially tough ones like above, I can’t immediately think of an optimal algorithm. Moves can be quite non-trivial.

Is there a feasible optimal algorithm? Or is this problem hard?


Get this bounty!!!

#StackBounty: #algorithms #algorithm-analysis #efficiency #linked-lists Linked List in Maximal Scoring Subsequences Algorithm

Bounty: 100

For a project, I’m implementing the All Maximal Scoring Subsequences algorithm. In the analysis portion of the paper, it describes an optimization that makes the algorithm run in linear time. Namely, when doing the search for the rightmost value of $j$ that satisfies $L_j < L_k$, we can search a linked list with monotonically decreasing $L_j$ values.

I’m slightly confused on how this works. If we store a pointer to the element $j$ we found during Step 3, then we must start iterating with some element in Step 1. What element would operate as the head of the linked list in this case? In other words, in Step 1, instead of searching through the whole list, what element do we start searching with?

I found these lectures notes easier to read than the actual paper. I also implemented a $O(n^2)$ version of the algorithm in Go here, which anyone may use to reference. However, I would like some help cutting the complexity down to $O(n)$. Any help would be appreciated!


Get this bounty!!!

#StackBounty: #algorithms #graphs #trees Cover k edges to minimize the length of the longest uncovered path in a tree

Bounty: 100

You are given a tree $Glangle V, Erangle$ and an integer $k$. The goal is to cover $k$ edges of the graph, to minimize the length of the longest uncovered path in the tree (and to return this path’s length)

Constrains: $O(nlog n)$ time and $O(n)$ space.

Now, the scheme of the algorithm is kinda obvious (from the given constrains). We need to priorities the edges and pick the highest $k$-edges (technically, we throw all edges to a priority queue and pick the first $k$)

So, I think, the keys should tell us how centric the edge is.

My first attempt was BFSing the graph from some node and BFSing again from the most distant node of the first scan.

Each node will get two values: distance from the first starting node and distance from the second starting node.

Though, I’m not sure how well this method grasp the notion of centric.

I’d be glad to hear your thoughts about the current idea or your own ideas.


Get this bounty!!!

#StackBounty: #ripple #algorithms #trust #future-proof Is the whole idea of proof of work or proof of stake quite unnecessary and super…

Bounty: 50

I only have a vague idea of proof of stake, but I think I reasonably understand proof of work. In the case of bitcoin, the whole hash thing business has to be combined with a mechanism to increase the difficulty according a certain time calculation. Now this difficulty/time calculation is merely a matter of good faith, and hope that a majority abides by it. If you look a bit deeper, the whole hash function itself relies on good faith, also called the 51% risk/attack whatever.

And hence you have currencies like ripple, which are just faith (maybe one degree higher compared to bitcoin and ethereum) as far as I can see, without any hash or other garnishing. This does seem to be equally functional. So did Satoshi over-design bitcoin? Does this suggest that a whole new generation of minimalistic cryptos will completely sweep away bitcoin, ethereum and the likes?


Get this bounty!!!

#StackBounty: #algorithms #strings #string-metrics #edit-distance Edit distance of list with unique elements

Bounty: 50

Levenshtein-Distance edit distance between lists
is a well studied problem.
But I can’t find much on possible improvements if
it is known that no element does occurs more than once in each list.

Let’s also assume that the elements are comparable/sortable
(but the lists to compare are not sorted to begin with).

In particular I am interested
if the uniqueness of elements makes it possible to improve upon
Ukkonen’s algorithm for edit distance
which has time complexity $O(min(m,n)s)$ and
space complexity $O(min(s,m,n)s)$,
where $s$ is the minimum cost of the editing steps.

More formally,

how efficiently can we compute the edit distance between two given strings
$s,t in Sigma^*$
with the promise that they don’t have any repeated letters?

$Sigma$ is a very large alphabet.


Get this bounty!!!

#StackBounty: #algorithms #expectation-maximization #t-distribution #latent-variable EM for a form of Student distribution

Bounty: 50

I consider $n$ replications from the following sampling density function for $y_iinBbb{R}^p$

begin{align}
f_p(y_ivertmu,Sigma,nu)={displaystyle {frac {Gamma left[(nu +p)/2right]}{Gamma (nu /2)nu ^{d/2}pi ^{p/2}left|{boldsymbol {Sigma }}right|^{1/2}}}left[1+{frac {1}{nu }}({mathbf {y} }-{boldsymbol {beta x_i }})^{T}{boldsymbol {Sigma }}^{-1}({mathbf {y} }-{boldsymbol {beta x_i }})right]^{-(nu +p)/2}}label{eq:2}
end{align}

Where $beta$ is unknown and the matrix $Sigma$ is parametrized as $sigma^2 Q.$

I would like to perform an EM algorithm.

If I am not mistaken, we have for the log-likelikehood
$$log L(Psi)=frac{-1}{2}nplog(2pi)-frac1{2}nlog vert Sigmavert-frac1{2sigma^2}sum_{j=1}^n z_j(y_j-beta x_i)^TQ^{-1}(y_j-beta x_i)$$

where I forgot everything about $nu$ because it supposed to be known.

Now we have

$$Zvert Y=ysim Gamma(m_1,m_2)$$ with $m_1=frac{1}{2}(nu+p)$ et $m_2=frac{1}{2}(nu+(y-mu)^TSigma(y-mu))$ for general $t_p(mu,Sigma,nu)$ distribution.

If I am doing the chameleon we have

$$Zvert Y=ysim Gamma(m_1,m_2)$$ with $m_1=frac{1}{2}(nu+p)$ et $m_2=frac{1}{2sigma^2}(nu+(y-mu_i)^T Q^{-1}(y-mu_i))$

In general case we have
begin{equation}
E_{Psi^{(k)}}(Z_jvert y_j)=frac{nu+p}{nu+Q(y_j,mu^{(k)},Sigma^{(k)})}
end{equation}

I can also do the chameleon here but I am pretty sure I miss something otherwise is trivial.

What am I missing ?


Get this bounty!!!

#StackBounty: #algorithms #computability #logic #search-problem Is there an analysis of the creation of axioms for a mathematical struc…

Bounty: 100

Historically, what has happened is the following:

  1. There is a “mechanical” structure, most importantly, arithmetic, which operates according to a set of well-defined rules that a stupid computer can easily follow. e.g. a computer can easily calculate $5349cdot 5345=…$

  2. There is a “mechanical” proof-system, most importantly, first order logic, which also operates according to well-defined rules that a stupid computer can follow. e.g. a computer can easily apply derivation rules to go from $forall x phi(x)$ to $exists x phi(x)$, etc.

  3. Then there is a non-mechanical “creative” human, who uses his badly-understood “natural insight” to formulate a set of “axioms about arithmetic”, which are words in (2)’s language “about” the operations of (1)’s arithmetic computations. He also formulates a set of “logical axioms”, the rules by which the mechanical proof system should operate. He chooses these rules because his intuitive insight says that they are “obviously correct”.

It seems to me that this third element in the chain (though it happens first chronologically), is always done in an ad-hoc way. But the fact that this has always been done this way so far, doesn’t mean that it necessarily has to be this way.

My question is: Has there been an analysis of the creation of axioms as a computational problem? That is, of the problem of choosing axioms about some mathematical structure, for use in a logic-language?

This computational problem is essentially: given e.g. the rules of arithmetic, find a first-order statement that we can use as an axiom to then derive further properties.

ps. I know that this is a vague question. I am simply asking whether someone has analysed this problem in some way.


Get this bounty!!!