#StackBounty: #algorithms #parallel-computing #haskell GroupBy key a sequence of ordered key values

Bounty: 50

I have a sorted (by key) sequence of key value pairs:

< (1,"A"),(1,"C"),(1,"G"),(1,"B"),(2,"D"),(2,"F"),(3,"E") >

And I have to produce a sequence of tuples like this:

< (1,< "A","C","G","B" >),(2, <"D","F">),(3, <"E">) >

Where the second element of the tuple is a sequence with the values corresponding to the key.

I am working on haskell, under the hood I have a Seq that is a Vector, but a pseudocode solution is OK.

I have this algorithm that will run in parallel in $O(lg n)$, but cost of work is too high (more than $O(n)$):

comb :: Seq s => ((a, s b) -> (a, s b) -> Bool) -> s (a, s b) -> s (a, s b) -> s (a, s b)
comb f sl sr | lengthS sr == 0 = sl
             | lengthS sl == 0 = sr
             | f (nthS sl ((lengthS sl) - 1)) (nthS sr 0) =  let 
                                                                 (cl, vl) = (nthS sl ((lengthS sl) - 1))
                                                                 (cr, vr) = (nthS sr 0)
                                                                 union = singletonS (cr, appendS vl vr)
                                                             in appendS (appendS (takeS ((lengthS sl) - 1) sl) union) (dropS 1 sr)
             | otherwise = appendS sl sr

groupDyC :: Seq s => ((a, s b) -> (a, s b) -> Bool) -> s (a, b) -> s (a, s b)
groupDyC f seq = reduceS (comb (f)) emptyS (mapS (x -> singletonS x) (mapS ((c, v) -> (c, singletonS v)) seq))

Where lengthA, nthS are $O(1)$ and appendS is linear to the size of both sequences.

My problem is that the solution has to have work in $O(n)$ and parallel cost (S) $O(lg n)$. How could I create an algorithm that runs on that cost with these operations:

enter image description here

Thank you very much.

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.