#StackBounty: #performance #haskell #recursion #optimization #benchmarking Why do these fixpoint cata / ana morphism definitions outper…

Bounty: 100

Consider these definitions from a previous question:

type Algebra f a = f a -> a

cata :: Functor f => Algebra f b -> Fix f -> b
cata alg = alg . fmap (cata alg) . unFix

fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg = fix $ f -> alg . fmap f . unFix

type CoAlgebra f a = a -> f a

ana :: Functor f => CoAlgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

fixana :: Functor f => CoAlgebra f a -> a -> Fix f
fixana coalg = fix $ f -> Fix . fmap f . coalg

I ran some benchmarks and the results are surprising me. criterion reports something like a tenfold speedup, specifically when O2 is enabled. I wonder what causes such massive improvement, and begin to seriously doubt my benchmarking abilities.

This is the exact criterion code I use:

smallWord, largeWord :: Word
smallWord = 2^10
largeWord = 2^20

shortEnv, longEnv :: Fix Maybe
shortEnv = ana coAlg smallWord
longEnv = ana coAlg largeWord

benchCata = nf (cata alg)
benchFixcata = nf (fixcata alg)

benchAna = nf (ana coAlg)
benchFixana = nf (fixana coAlg)

main = defaultMain
    [ bgroup "cata"
        [ bgroup "short input"
            [ env (return shortEnv) $ x -> bench "cata"    (benchCata x)
            , env (return shortEnv) $ x -> bench "fixcata" (benchFixcata x)
            ]
        , bgroup "long input"
            [ env (return longEnv) $ x -> bench "cata"    (benchCata x)
            , env (return longEnv) $ x -> bench "fixcata" (benchFixcata x)
            ]
        ]
    , bgroup "ana"
        [ bgroup "small word"
            [ bench "ana" $ benchAna smallWord
            , bench "fixana" $ benchFixana smallWord
            ]
        , bgroup "large word"
            [ bench "ana" $ benchAna largeWord
            , bench "fixana" $ benchFixana largeWord
            ]
        ]
    ]

And some auxiliary code:

alg :: Algebra Maybe Word
alg Nothing = 0
alg (Just x) = succ x

coAlg :: CoAlgebra Maybe Word
coAlg 0 = Nothing
coAlg x = Just (pred x)

Compiled with O0, the digits are pretty even. With O2, fix~ functions seem to outperform the plain ones:

benchmarking cata/short input/cata
time                 31.67 μs   (31.10 μs .. 32.26 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 31.20 μs   (31.05 μs .. 31.46 μs)
std dev              633.9 ns   (385.3 ns .. 1.029 μs)
variance introduced by outliers: 18% (moderately inflated)

benchmarking cata/short input/fixcata
time                 2.422 μs   (2.407 μs .. 2.440 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.399 μs   (2.388 μs .. 2.410 μs)
std dev              37.12 ns   (31.44 ns .. 47.06 ns)
variance introduced by outliers: 14% (moderately inflated)

I would appreciate if someone can confirm or spot a flaw.

*I compiled things with ghc 8.2.2 on this occasion.)


postscriptum

This post from back in 2012 elaborates on the performance of fix in quite a fine detail. (Thanks to @chi for the link.)


Get this bounty!!!

#StackBounty: #machine-learning #predictive-models #performance #sensitivity #specificity Assessing correlated predictions

Bounty: 50

Let’s assume we have a prediction algorithm (if it helps, imagine it’s using some boosted tree method) that does daily predictions for whether some event will happen to a unit (e.g. a machine that might break down, a patient that might get a problematic medical event etc.) during the next week. We can only get training and test data for a low number of units (e.g. 400 + 100 or so) across a limited period of time (e.g. half a year).

How would one assess prediction performance (e.g. sensitivity/specificity/AuROC etc.) of some algorithm (e.g. some tree method) in this setting on test data? Presumably there is a potential issue in that the prediction intervals overlap and even non-overlapping intervals for the same unit are somewhat correlated (i.e. if I can predict well for a particular unit due to its specific characteristics, I may do well on all time intervals for that unit, but this does not mean the algorithm would generalize well).

Perhaps I have just not hit on the right search terms, but I have failed to find anything published on this topic (surely someone has written about this before?!). Any pointers to any literature?

My initial thought was that perhaps naively calculated (i.e. just treating this as independent observations and predictions) point estimates of sensitivity/specificity might be fine, but that any problem would be more with the uncertainty around these? If so, could one just bootstrap (drawing whole units with replacement) and get decent assessments that way?


Get this bounty!!!

#StackBounty: #python #performance #beginner #numpy #markov-chain Solve the phase state between two haplotype blocks using markov trans…

Bounty: 50

I have spent about more than a year in python, but I come from biology background. I should say I have got some understanding of for-loop and nested-for-loop to workout the solution to the problem I have had. But, I suppose that there are better and comprehensive way of solving problem.

Below is the code I wrote to solve the haplotype phase state between two haplotype blocks.

Here is a short snipped of my data:

chr  pos       ms01e_PI  ms01e_PG_al  ms02g_PI  ms02g_PG_al  ms03g_PI  ms03g_PG_al  ms04h_PI  ms04h_PG_al  ms05h_PI  ms05h_PG_al  ms06h_PI  ms06h_PG_al
2    15881764  4         C|T          6         C|T          7         T|T          7         T|T          7         C|T          7         C|T
2    15881767  4         C|C          6         T|C          7         C|C          7         C|C          7         T|C          7         C|C
2    15881989  4         C|C          6         A|C          7         C|C          7         C|C          7         A|T          7         A|C
2    15882091  4         G|T          6         G|T          7         T|A          7         A|A          7         A|T          7         A|C
2    15882451  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|A
2    15882454  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|T
2    15882493  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|T
2    15882505  4         A|T          4         T|A          7         T|T          7         T|T          7         A|C          7         A|T

Problem:

In the given data I want to solve the phase state of haplotype for the sample ms02g. The suffix PI represents the phased block (i.e all data within that block is phased).

For sample ms02g C-T-A-G is phased (PI level 6), so is T-C-C-T. Similarly data within PI level 4 is also phased. But, since the haplotype is broken in two levels, we don’t know which phase from level-6 goes with which phase of level-4. Phase to be solved - sample ms02g

ms02g_PI  ms02g_PG_al
6         C|T
6         T|C
6         A|C
6         G|T
×——————————×—————> Break Point
4         T|C
4         T|C
4         T|C
4         T|A

Since, all other samples are completely phased I can run a markov-chain transition probabilities to solve the phase state. To the human eye/mind you can clearly see and say that left part of ms02g (level 6, i.e C-T-A-G is more likely to go with right block of level 4 C-C-C-A).

Calculation:

Step 01:

I prepare required haplotype configuration:

  • The top haplotype-PI is block01 and bottom is block-02.
  • The left phased haplotype within each block is Hap-A and the right is Hap-B.

So, there are 2 haplotype configurations:

  • Block01-HapA with Block02-HapA, so B01-HapB with B02-HapB
  • Block01-HapA with Block02-HapB, so B01-HapB with B02-HapA

I count the number of transition for each nucleotide of PI-6 to each nucleotide of PI-4 for each haplotype configuration.
possible transition from PI-6 to PI-4

  Possible
  transitions    ms02g_PG_al
    │       ┌┬┬┬ C|T
    │       ││││ T|C
    │       ││││ A|C
    │       ││││ G|T
    └─────> ││││  ×—————> Break Point
            │││└ T|C
            ││└─ T|C
            │└── T|C
            └─── T|A
  • and multiply the transition probabilities for first nucleotide in PI-6 to all nucleotides of PI-4. Then similarly multiply the transition probability for 2nd nucleotide of PI-6 to all nucleotides in PI-4.

  • then I sum the transition probabilities from one level to another.

enter image description here

  Possible
  transitions    ms02g_PG_al
    │       ┌┬┬┬ C|T 
    │       ││││ T|C | Block 1
    │       ││││ A|C |
    │       ││││ G|T /
    └─────> ││││  ×—————> Break Point
            │││└ T|C 
            ││└─ T|C |
            │└── T|C | Block 2
            └─── T|A /
                 ↓ ↓
             Hap A Hap B

calculating likelyhood

  Block-1-Hap-A with Block-2-Hap-B    vs.    Block-1-Hap-A with Block-2-Hap-A

  C|C × C|C × C|C × A|C = 0.58612            T|C × T|C × T|C × T|C = 0.000244
+ C|T × C|T × C|T × A|T = 0.3164           + T|T × T|T × T|T × T|T = 0.00391
+ C|A × C|A × C|A × A|A = 0.482            + T|A × T|A × T|A × T|A = 0.0007716
+ C|G × C|G × C|G × A|G = 0.3164           + T|G × T|G × T|G × T|G = 0.00391
                          ———————                                    —————————
                    Avg = 0.42523                              Avg = 0.002207

        Likelyhood Ratio = 0.42523 / 0.002207 = 192.67

I have the following working code with python3:
I think the transition probs can be mostly optimized using numpy, or using a markov-chain package. But, I was not able to work them out in the way I desired. Also, it took me sometime to get to what I wanted to do.

Any suggestions with explanation. Thanks.

#!/home/bin/python
from __future__ import division

import collections
from itertools import product
from itertools import islice
import itertools
import csv
from pprint import pprint


''' function to count the number of transitions '''
def sum_Probs(pX_Y, pX):
    try:
        return float(pX_Y / pX)
    except ZeroDivisionError:
        return 0


with open("HaploBlock_Updated.txt", 'w') as f:

    ''' Write header (part 01): This is just a front part of the header. The rear part of the header is
    update dynamically depending upon the number of samples '''
    f.write('t'.join(['chr', 'pos', 'ms02g_PI', 'ms02g_PG_al', 'n']))

    with open('HaploBlock_for_test-forHMMtoy02.txt') as Phased_Data:

        ''' Note: Create a dictionary to store required data. The Dict should contain information about
         two adjacent haplotype blocks that needs extending. In this example I want to extend the
         haplotypes for sample ms02g which has two blocks 6 and 4. So, I read the PI and PG value
         for this sample. Also, data should store with some unique keys. Some keys are: chr, pos,
         sample (PI, PG within sample), possible haplotypes ... etc .. '''


        Phased_Dict = csv.DictReader(Phased_Data, delimiter='t')
        grouped = itertools.groupby(Phased_Dict, key=lambda x: x['ms02g_PI'])


        ''' Function to read the data as blocks (based on PI values of ms02g) at a time'''
        def accumulate(data):
            acc = collections.OrderedDict()
            for d in data:
                for k, v in d.items():
                    acc.setdefault(k, []).append(v)
            return acc


        ''' Store data as keys,values '''
        grouped_data = collections.OrderedDict()
        for k, g in grouped:
            grouped_data[k] = accumulate(g)


        ''' Clear memory '''
        del Phased_Dict
        del grouped

        ''' Iterate over two Hap Blocks at once. This is done to obtain all possible Haplotype
        configurations between two blocks. The (keys,values) for first block is represented as
        k1,v2 and for the later block as k2,v2. '''

        # Empty list for storing haplotypes from each block
        hap_Block1_A = [];
        hap_Block1_B = []
        hap_Block2_A = [];
        hap_Block2_B = []

        # Create empty list for possible haplotype configurations from above block
        hapB1A_hapB2A = []
        hapB1B_hapB2B = []

        ''' list of all available samples (index, value) '''
        sample_list = [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'),
                       ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al'),
                       ('ms05h_PI', 'ms05h_PG_al'), ('ms06h_PI', 'ms06h_PG_al')]


        ''' read data from two consecutive blocks at a time '''
        for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):
            ''' skip if one of the keys has no values'''
            if k1 == '.' or k2 == '.':
                continue


            ''' iterate over the first Haplotype Block, i.e the k1 block.
            The nucleotides in the left of the phased SNPs are called Block01-haplotype-A,
            and similarly on the right as Block01-haplotype-B. '''
            hap_Block1_A = [x.split('|')[0] for x in v1['ms02g_PG_al']] # the left haplotype of Block01
            hap_Block1_B = [x.split('|')[1] for x in v1['ms02g_PG_al']]


            ''' iterate over the second Haplotype Block,
            i.e the k2 block '''
            hap_Block2_A = [x.split('|')[0] for x in v2['ms02g_PG_al']]
            hap_Block2_B = [x.split('|')[1] for x in v2['ms02g_PG_al']]



            ''' Now, we have to start to solve the possible haplotype configuration.
            Possible haplotype Configurations will be, Either :
            1) Block01-haplotype-A phased with Block02-haplotype-A,
                creating -> hapB1A-hapB2A, hapB1B-hapB2B
            Or,
            2) Block01-haplotype-A phased with Block02-haplotype-B
                creating -> hapB1A-hapB2B, hapB1B-hapB2A '''

            ''' First possible configuration '''
            hapB1A_hapB2A = [hap_Block1_A, hap_Block2_A]
            hapB1B_hapB2B = [hap_Block1_B, hap_Block2_B]

            ''' Second Possible Configuration '''
            hapB1A_hapB2B = [hap_Block1_A, hap_Block2_B]
            hapB1B_hapB2A = [hap_Block1_B, hap_Block2_A]


            print('nStarting MarkovChains')
            ''' Now, start preping the first order markov transition matrix
             for the observed haplotypes between two blocks.'''


            ''' To store the sum-values of the product of the transition probabilities.
            These sum are added as the product-of-transition is retured by nested for-loop;
            from "for m in range(....)" '''

            Sum_Of_Product_of_Transition_probabilities_hapB1A_hapB2A = 
                sumOf_PT_hapB1A_B2A = 0
            Sum_Of_Product_of_Transition_probabilities_hapB1B_hapB2B = 
                sumOf_PT_hapB1B_B2B = 0

            Sum_Of_Product_of_Transition_probabilities_hapB1A_hapB2B = 
                sumOf_PT_hapB1A_B2B = 0
            Sum_Of_Product_of_Transition_probabilities_hapB1B_hapB2A = 
                sumOf_PT_hapB1B_B2A = 0


            for n in range(len(v1['ms02g_PI'])):  # n-ranges from block01

                ''' Set the probabilities of each nucleotides at Zero. They are updated for each level
                 of "n" after reading the number of each nucleotides at that position. '''
                pA = 0; pT = 0; pG = 0; pC = 0
                # nucleotide_prob = [0., 0., 0., 0.] # or store as numpy matrix

                # Also storing these values as Dict
                # probably can be improved
                nucleotide_prob_dict = {'A': 0, 'T': 0, 'G': 0, 'C': 0}
                print('prob as Dict: ', nucleotide_prob_dict)


                ''' for storing the product-values of the transition probabilities.
                These are updated for each level of "n" paired with each level of "m". '''
                product_of_transition_Probs_hapB1AB2A = POTP_hapB1AB2A = 1
                product_of_transition_Probs_hapB1BB2B = POTP_hapB1BB2B = 1

                product_of_transition_Probs_hapB1AB2B = POTP_hapB1AB2B = 1
                product_of_transition_Probs_hapB1BB2A = POTP_hapB1BB2A = 1



                ''' Now, we read each level of "m" to compute the transition from each level of "n"
                to each level of "m". '''
                for m in range(len(v2['ms02g_PI'])): # m-ranges from block02
                    # transition for each level of 0-to-n level of V1 to 0-to-m level of V2
                    ''' set transition probabilities at Zero for each possible transition '''
                    pA_A = 0; pA_T = 0; pA_G = 0; pA_C = 0
                    pT_A = 0; pT_T = 0; pT_G = 0; pT_C = 0
                    pG_A = 0; pG_T = 0; pG_G = 0; pG_C = 0
                    pC_A = 0; pC_T = 0; pC_G = 0; pC_C = 0

                    ''' Also, creating an empty dictionary to store transition probabilities
                    - probably upgrade using numpy '''
                    transition_prob_dict = {'A_A' : 0, 'A_T' : 0, 'A_G' : 0, 'A_C' : 0,
                    'T_A' : 0, 'T_T' : 0, 'T_G' : 0, 'T_C' : 0,
                    'G_A' : 0, 'G_T' : 0, 'G_G' : 0, 'G_C' : 0,
                    'C_A' : 0, 'C_T' : 0, 'C_G' : 0, 'C_C' : 0}


                    ''' Now, loop through each sample to compute initial probs and transition probs '''
                    for x, y in sample_list:
                        print('sample x and y:', x,y)


                        ''' Update nucleotide probability for this site
                        - only calculated from v1 and only once for each parse/iteration '''
                        if m == 0:
                            pA += (v1[y][n].count('A'))
                            pT += (v1[y][n].count('T'))
                            pG += (v1[y][n].count('G'))
                            pC += (v1[y][n].count('C'))

                            nucleotide_prob_dict['A'] = pA
                            nucleotide_prob_dict['T'] = pT
                            nucleotide_prob_dict['G'] = pG
                            nucleotide_prob_dict['C'] = pC

                        nucleotide_prob = [pA, pT, pG, pC]


                        ''' Now, update transition matrix '''
                        nucl_B1 = (v1[y][n]).split('|')  # nucleotides at Block01
                        nucl_B2 = (v2[y][m]).split('|')  # nucleotides at Block02


                        ''' create possible haplotype configurations between "n" and "m".
                         If the index (PI value) are same we create zip, if index (PI value) are
                         different we create product. '''

                        HapConfig = []  # create empty list

                        if v1[x][n] == v2[x][m]:
                            ziped_nuclB1B2 = [(x + '_' + y) for (x,y) in zip(nucl_B1, nucl_B2)]
                            HapConfig = ziped_nuclB1B2

                            ''' Move this counting function else where '''
                            pA_A += (HapConfig.count('A_A'))  # (v1[y][0].count('A'))/8
                            pA_T += (HapConfig.count('A_T'))
                            pA_G += (HapConfig.count('A_G'))
                            pA_C += (HapConfig.count('A_C'))

                            pT_A += (HapConfig.count('T_A'))  # (v1[y][0].count('A'))/8
                            pT_T += (HapConfig.count('T_T'))
                            pT_G += (HapConfig.count('T_G'))
                            pT_C += (HapConfig.count('T_C'))

                            pG_A += (HapConfig.count('G_A'))  # (v1[y][0].count('A'))/8
                            pG_T += (HapConfig.count('G_T'))
                            pG_G += (HapConfig.count('G_G'))
                            pG_C += (HapConfig.count('G_C'))

                            pC_A += (HapConfig.count('C_A'))
                            pC_T += (HapConfig.count('C_T'))
                            pC_G += (HapConfig.count('C_G'))
                            pC_C += (HapConfig.count('C_C'))


                        if v1[x][n] != v2[x][m]:
                            prod_nuclB1B2 = [(x + '_' + y) for (x,y) in product(nucl_B1, nucl_B2)]
                            HapConfig = prod_nuclB1B2
                            print('prod NuclB1B2: ', prod_nuclB1B2)

                            pA_A += (HapConfig.count('A_A'))/2
                            pA_T += (HapConfig.count('A_T'))/2
                            pA_G += (HapConfig.count('A_G'))/2
                            pA_C += (HapConfig.count('A_C'))/2

                            pT_A += (HapConfig.count('T_A'))/2
                            pT_T += (HapConfig.count('T_T'))/2
                            pT_G += (HapConfig.count('T_G'))/2
                            pT_C += (HapConfig.count('T_C'))/2

                            pG_A += (HapConfig.count('G_A'))/2
                            pG_T += (HapConfig.count('G_T'))/2
                            pG_G += (HapConfig.count('G_G'))/2
                            pG_C += (HapConfig.count('G_C'))/2

                            pC_A += (HapConfig.count('C_A'))/2
                            pC_T += (HapConfig.count('C_T'))/2
                            pC_G += (HapConfig.count('C_G'))/2
                            pC_C += (HapConfig.count('C_C'))/2


                    ''' Now, compute nucleotide and transition probabilities for each nucleotide
                    from each 0-n to 0-m at each sample. This updates the transition matrix in
                    each loop. **Note: At the end this transition probabilities should sum to 1 '''

                    ''' Storing nucleotide probabilities '''
                    nucleotide_prob = [pA, pT, pG, pC]

                    ''' Storing transition probability as dict'''
                    transition_prob_dict['A_A'] = sum_Probs(pA_A,pA)
                    transition_prob_dict['A_T'] = sum_Probs(pA_T,pA)
                    transition_prob_dict['A_G'] = sum_Probs(pA_G,pA)
                    transition_prob_dict['A_C'] = sum_Probs(pA_C,pA)

                    transition_prob_dict['T_A'] = sum_Probs(pT_A,pT)
                    transition_prob_dict['T_T'] = sum_Probs(pT_T,pT)
                    transition_prob_dict['T_G'] = sum_Probs(pT_G,pT)
                    transition_prob_dict['T_C'] = sum_Probs(pT_C,pT)

                    transition_prob_dict['G_A'] = sum_Probs(pG_A,pG)
                    transition_prob_dict['G_T'] = sum_Probs(pG_T,pG)
                    transition_prob_dict['G_G'] = sum_Probs(pG_G,pG)
                    transition_prob_dict['G_C'] = sum_Probs(pG_C,pG)

                    transition_prob_dict['C_A'] = sum_Probs(pC_A,pC)
                    transition_prob_dict['C_T'] = sum_Probs(pC_T,pC)
                    transition_prob_dict['C_G'] = sum_Probs(pC_G,pC)
                    transition_prob_dict['C_C'] = sum_Probs(pC_C,pC)


                    '''Now, we start solving the haplotype configurations after we have the
                      required data (nucleotide and transition probabilities).
                    Calculate joint probability for each haplotype configuration.
                    Step 01: We calculate the product of the transition from one level
                    of "n" to several levels of "m". '''

                    ''' finding possible configuration between "n" and "m". '''
                    hapB1A_hapB2A_transition = hapB1A_hapB2A[0][n] + '_' + hapB1A_hapB2A[1][m]
                    hapB1B_hapB2B_transition = hapB1B_hapB2B[0][n] + '_' + hapB1B_hapB2B[1][m]

                    hapB1A_hapB2B_transition = hapB1A_hapB2B[0][n] + '_' + hapB1A_hapB2B[1][m]
                    hapB1B_hapB2A_transition = hapB1B_hapB2A[0][n] + '_' + hapB1B_hapB2A[1][m]


                    ''' computing the products of transition probabilities on the for loop '''
                    POTP_hapB1AB2A *= transition_prob_dict[hapB1A_hapB2A_transition]
                    POTP_hapB1BB2B *= transition_prob_dict[hapB1B_hapB2B_transition]

                    POTP_hapB1AB2B *= transition_prob_dict[hapB1A_hapB2B_transition]
                    POTP_hapB1BB2A *= transition_prob_dict[hapB1B_hapB2A_transition]


                ''' Step 02: sum of the product of the transition probabilities '''
                sumOf_PT_hapB1A_B2A += POTP_hapB1AB2A
                sumOf_PT_hapB1B_B2B += POTP_hapB1BB2B

                sumOf_PT_hapB1A_B2B += POTP_hapB1AB2B
                sumOf_PT_hapB1B_B2A += POTP_hapB1BB2A


            ''' Step 03: Now, computing the likely hood of each haplotype configuration '''
            print('computing likelyhood:')

            likelyhood_hapB1A_with_B2A_Vs_B2B = LH_hapB1AwB2AvsB2B = 
                sumOf_PT_hapB1A_B2A / sumOf_PT_hapB1A_B2B
            likelyhood_hapB1B_with_B2B_Vs_B2A = LH_hapB1BwB2BvsB2A = 
                sumOf_PT_hapB1B_B2B / sumOf_PT_hapB1B_B2A

            likelyhood_hapB1A_with_B2B_Vs_B2A = LH_hapB1AwB2BvsB2A = 
                sumOf_PT_hapB1A_B2B / sumOf_PT_hapB1A_B2A
            likelyhood_hapB1B_with_B2A_Vs_B2B = LH_hapB1BwB2AvsB2B = 
                sumOf_PT_hapB1B_B2A / sumOf_PT_hapB1B_B2B


            print('nlikely hood Values: B1A with B2A vs. B2B, B1B with B2B vs. B2A')
            print(LH_hapB1AwB2AvsB2B, LH_hapB1BwB2BvsB2A)

            print('nlikely hood Values: B1A with B2B vs. B2A, B1B with B2A vs. B2B')
            print(LH_hapB1AwB2BvsB2A, LH_hapB1BwB2AvsB2B)


            ''' Update the phase state of the block based on evidence '''
            if (LH_hapB1AwB2AvsB2B + LH_hapB1BwB2BvsB2A) > 
                4*(LH_hapB1AwB2BvsB2A + LH_hapB1BwB2AvsB2B):
                print('Block-1_A is phased with Block-2_A')

                for x in range(len(v1['ms02g_PI'])):
                    PI_value = v1['ms02g_PI'][x]
                    # Note: so, we trasfer the PI value from ealier block to next block

                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                       hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], PI_value,
                                       hapB1A_hapB2A[1][x] + '|' + hapB1B_hapB2B[1][x], 'n']))

            elif (LH_hapB1AwB2AvsB2B + LH_hapB1BwB2BvsB2A) < 
                4 * (LH_hapB1AwB2BvsB2A + LH_hapB1BwB2AvsB2B):
                print('Block-1_A is phased with Block-2_B')

                for x in range(len(v1['ms02g_PI'])):
                    PI_value = v1['ms02g_PI'][x]
                    # Note: so, we trasfer the PI value from ealier block to next block...
                        # but, the phase position are reversed

                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                       hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], PI_value,
                                       hapB1B_hapB2B[1][x] + '|' + hapB1A_hapB2A[1][x], 'n']))


            else:
                print('cannot solve the phase state with confidence - sorry ')
                for x in range(len(v1['ms02g_PI'])):
                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                   hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], v2['ms02g_PI'][x],
                                       hapB1A_hapB2A[1][x] + '|' + hapB1B_hapB2B[1][x], 'n']))


Get this bounty!!!

#StackBounty: #performance #mariadb #threads MariaDB threads optimalization

Bounty: 100

NEW (pcie) server:
Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz, 1TB NVMe disks, 128 GB RAM, installed Debian 4.9.65-3+deb9u1, Ver 15.1 Distrib 10.1.26-MariaDB

moved binary db files from

OLD server:
Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz, SSD disk, 64 GB RAM, FreeBSD 11.0-STABLE, 10.1.21-MariaDB

On servers is running just mysql, I copy my.ini file, config files are same.

Run mysqlslap benchmark (always restarted server before doing each test):

root@db1:/tmp # mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=1 --iterations=1
Benchmark
        Average number of seconds to run all queries: 59.573 seconds
        Minimum number of seconds to run all queries: 59.573 seconds
        Maximum number of seconds to run all queries: 59.573 seconds
        Number of clients running queries: 1
        Average number of queries per client: 100000


root@pcie:~# mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=1 --iterations=1
Benchmark
        Average number of seconds to run all queries: 31.151 seconds
        Minimum number of seconds to run all queries: 31.151 seconds
        Maximum number of seconds to run all queries: 31.151 seconds
        Number of clients running queries: 1
        Average number of queries per client: 100000
====================================================================================================================================
root@db1:/tmp # mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=100 --iterations=1
Benchmark
        Average number of seconds to run all queries: 568.082 seconds
        Minimum number of seconds to run all queries: 568.082 seconds
        Maximum number of seconds to run all queries: 568.082 seconds
        Number of clients running queries: 100
        Average number of queries per client: 100000

root@pcie:/etc/security/limits.d# mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=100 --iterations=1
Benchmark
        Average number of seconds to run all queries: 2059.712 seconds
        Minimum number of seconds to run all queries: 2059.712 seconds
        Maximum number of seconds to run all queries: 2059.712 seconds
        Number of clients running queries: 100
        Average number of queries per client: 100000



====================================================================================================================================
root@db1:/tmp # mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=8 --iterations=1
Benchmark
        Average number of seconds to run all queries: 134.003 seconds
        Minimum number of seconds to run all queries: 134.003 seconds
        Maximum number of seconds to run all queries: 134.003 seconds
        Number of clients running queries: 8
        Average number of queries per client: 100000

root@pcie:/etc/security/limits.d# mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=8 --iterations=1
Benchmark
        Average number of seconds to run all queries: 133.410 seconds
        Minimum number of seconds to run all queries: 133.410 seconds
        Maximum number of seconds to run all queries: 133.410 seconds
        Number of clients running queries: 8
        Average number of queries per client: 100000

As you can see, NEW (pcie) server is performing very good when running concurrency=1, performance is same when concurrency=8, and performance is very bad when concurrency=100.

Here are interesting results using internal benchmark:

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=8 --iterations=500 --verbose
        Average number of seconds to run all queries: 0.002 seconds
DB1:    Average number of seconds to run all queries: 0.002 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=16 --iterations=500
        Average number of seconds to run all queries: 0.007 seconds
DB1:    Average number of seconds to run all queries: 0.005 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=32 --iterations=500
        Average number of seconds to run all queries: 0.015 seconds
DB1:    Average number of seconds to run all queries: 0.011 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=64 --iterations=500
        Average number of seconds to run all queries: 0.033 seconds
DB1:    Average number of seconds to run all queries: 0.029 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=128 --iterations=500
        Average number of seconds to run all queries: 0.074 seconds
DB1:    Average number of seconds to run all queries: 0.097 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=256 --iterations=500
        Average number of seconds to run all queries: 0.197 seconds
DB1:    Average number of seconds to run all queries: 0.293 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=512 --iterations=500
        Average number of seconds to run all queries: 0.587 seconds
DB1:    Average number of seconds to run all queries: 1.009 seconds

Anybody have idea what to check, how to optimize the setup? I am using MyISAM tables only, mariadb config is same on both servers…

Update with more info: Initially I installed on NEW DB server FREEBSD, MariaDB performance was bad, I thought it is OS related problem, but same symptoms are on Linux. During benchmark there is basically no IO after filling the cache, so this is not IO related problem.

Thanks for any ideas.


Get this bounty!!!

#StackBounty: #java #performance #algorithm #guava Generating a string in a particular format by reading data from a map

Bounty: 50

I have a processToTaskIdHolder Map which contains processId as the key and taskId as the value. Now I am iterating this map and at the end I am making a String in particular format.

For example:-

  • Let’s say I have 123 as the key and 009 is the value in the processToTaskIdHolder map.
  • Now I will make a “activityKey” using 123 and then get data basis on this key.
  • Now I will iterate all the categories for that activityKey and check whether those categoryId are already present in processToTaskIdHolder keyset or not. If they are present, then I will extract taskId for that categoryId from the map and also extract score for that categoryId and store it in an Info class.
  • Same category can be present with different score for different processId.

Now I need to repeat above steps for each activity I have in activities list.

So my formatted string will be like this:-

A,B,C:Score1,D:Score2
P,Q,R:Score1,S:Score2
  • Where A is the categoryId for the processId C and D, and Score1 is the score for categoryId A for processId C. Score2 is the score for categoryId A but for processId D. We have different scores for same categories for two different processes. It means, categoryId A was present in both processId C and D so I need to get the score for both the cases and make a string like that. And B is the taskId for categoryId A which will be present in the map.
  • Where P is the categoryId for the processId R and S, and Score1 is the score for categoryId P for processId R. Score2 is the score for categoryId P but for processId S. We have different scores for same categories for two different processes. It means, categoryId P was present in both processId R and S so I need to get the score for both the cases and make a string like that. And Q is the taskId for categoryId P which will be present in the map.

I have this code which does the job but I think it’s not the right and efficient way to achieve above formatted string. I believe it can be done in a much better way.

  private static final List<String> activities = Arrays.asList("tree", "gold", "print", "catch");

  public static void reverseLookup(final String clientId, final Map<String, String> processToTaskIdHolder) {
    Multimap<String, Info> reverseLookup = LinkedListMultimap.create();
    for (String activity : activities) {
      for (Entry<String, String> entry : processToTaskIdHolder.entrySet()) {
        String activityKey = "abc_" + activity + "_" + clientId + "_" + entry.getKey();
        Optional<Datum> datum = getData(activityKey);
        if (!datum.isPresent()) {
          continue;
        }
        List<Categories> categories = datum.get().getCategories();
        for (Categories category : categories) {
          String categoryId = String.valueOf(category.getLeafCategId());
          if (processToTaskIdHolder.containsKey(categoryId)) {
            Info info = new Info(entry.getKey(), String.valueOf(category.getScore()));
            reverseLookup.put(categoryId + ":" + processToTaskIdHolder.get(categoryId), info);
          }
        }
      }
      String formattedString = generateString(reverseLookup);
      System.out.println(formattedString);
    }
  }

  private static String generateString(final Multimap<String, Info> reverseLookup) {
    StringBuilder sb = new StringBuilder();
    for (Entry<String, Collection<Info>> entry : reverseLookup.asMap().entrySet()) {
      sb.append(entry.getKey().split(":")[0]).append(",").append(entry.getKey().split(":")[1])
          .append(",");
      String sep = "";
      for (Info info : entry.getValue()) {
        sb.append(sep).append(info.getLeafCategoryId()).append(":").append(info.getScore());
        sep = ",";
      }
      sb.append(System.getProperty("line.separator"));
    }
    return sb.toString();
  }

In the reverseLookup map, I have a key in this format – “a:b”, so I’m not sure instead of making a key like that. Maybe it can be done in some other better way?

Note: I am working with Java 7. Categories and Info class is a simple immutable class with basic toString implementations.


Get this bounty!!!

#StackBounty: #performance #linux-networking #udp Disable NIC Receive Side Scaling hashing

Bounty: 100

On a benchmark lab system running Fedora Core 27 I’ve got Intel X710 10GE cards and 12-core Xeon processors, configured with 12 NIC queues and RX Flow Hashing based on both IP addresses and port numbers.

This results in uneven balancing between cores, and inconsistent performance results. I’ve got irqbalance disabled, with a 1:1 mapping from NIC queue to CPU core configure via /proc.

My application is UDP based so I’m not really bothered about the hashing. I can’t generate enough entropy across the inputs to the hash function to get an even distribution of outputs, so I’d quite like to just try plain round robin instead.

Is there a way to disable RSS for UDP and get round robin whilst still maintaining separate queues? All of the links I’ve found that talk about disabling RSS also seem to disable multiple queues.


Get this bounty!!!

#StackBounty: #machine-learning #precision-recall #performance #confusion-matrix #curves How can Precision-Recall (PR) curves be used t…

Bounty: 50

How can Precision-Recall (PR) curves be used to judge overall classifier performance when Precision and Recall are class based metrics?

Since in a binary classifier, there are two classes, often labelled positive (+1) and negative (-1). Yet, the classifier performance metrics [y] precision (PPV) and [x] recall (TPR) which are used to plot PR curves can have different values for each of the two classes (if you swap the positive and negative classes). In almost all examples of PR curves, there is usually only a single curve, when there should surely be at least two curves (one curve per class)?

More specifically:

  1. Does the PR curve really only represent the precision and recall of a single class, or has some operation been done (e.g. averaging) to combine the precision and recall of both classes?

  2. Does it make sense to judge a classifier’s performance based on only looking at the PR curve for the positive class?

  3. Why are the metrics TNR and NPV not somehow integrated into the curve or graph, for a better overview of classifier performance?


Get this bounty!!!

#StackBounty: #python #performance #algorithm #multiprocessing Creating a graph representing all combinations of 4-bit binary strings

Bounty: 100

I have an algorithm that creates a graph that has all representations of 4-bit binary strings encoded in the form of the shortest graph paths, where an even number in the path means 0, while an odd number means 1:

from itertools import permutations, product
import networkx as nx
import matplotlib.pyplot as plt
import progressbar
import itertools

g = nx.Graph()

dodane_pary=[]   

def groups(sources, template):
    func = permutations
    keys = sources.keys()
    combos = [func(sources[k], template.count(k)) for k in keys]
    for t in product(*combos):
        d = {k: iter(n) for k, n in zip(keys, t)}
        yield [next(d[k]) for k in template]                                      

bar = progressbar.ProgressBar(maxval=len(list(itertools.product(tuple(range(2)), repeat=4)))).start()
count=1
dobre2=[]
# I create 4-bit binary strings
for y,i in enumerate(itertools.product(tuple(range(2)), repeat=4)): 
    # I do not include one of the pairs of binary strings that have a mirror image
    if tuple(reversed(i)) >= tuple(i):
       # I create representations of binary strings, where 0 is 'v0' and 1 is 'v1'. For example, the '001' combination is now 'v0v0v1'
       a = ['v{}'.format(x%2) for x in i] 

       if len(dodane_pary)!=count+1:
           # I add an even number if it was not or an odd number if it was not in the 'dobre2' list
           for p in range(2):
               if len([i for i in dobre2 if i%2 == p ])==0:
                   dobre2.insert(p,p)

           h=0          
           while len(dodane_pary)<count:   

            if h!=0:   
               # extends the list 'dobre2' by subsequent even and odd numbers if the step 'h = 0' did not give the desired effects
               for q in range(2):   
                   g.add_node([i for i in dobre2 if i%2 == q][-1] + 2)
                   dobre2.append([i for i in dobre2 if i%2 == q][-1] + 2)

            sources={}
            for x in range(2):
                sources["v{0}".format(x)] = [i for i in dobre2 if i%2 == x]
            # for each representation in the form 'v0v0v1' for example, I examine all combinations of strings where 'v0' is an even number 'a' v1 'is an odd number, choosing values from the' dobre2 'list and checking the following conditions.
            for aaa_binary in groups(sources, a):

                if len(dodane_pary)!=count:
                    # adding new nodes and edges if they did not exist
                    g.add_nodes_from (aaa_binary)
                    t1 = (aaa_binary[0],aaa_binary[1])
                    t2 = (aaa_binary[1],aaa_binary[2])
                    t3 = (aaa_binary[2],aaa_binary[3])

                    added_now = []                      
                    for edge in (t1,t2,t3):
                        if not g.has_edge(*edge):
                           g.add_edge(*edge)
                           added_now.append(edge)

                    dodane_pary.append(aaa_binary)  
                    # checking the condition whether the shortest path condition on the existing graph is met after the added edges. if not, newly removed edges remove.
                    for j in range(len(dodane_pary)):
                        if nx.shortest_path(g, aaa_binary[0], aaa_binary[3])!=aaa_binary or nx.shortest_path(g, dodane_pary[j][0], dodane_pary[j][3])!=dodane_pary[j]:
                           for edge in added_now:
                               g.remove_edge(*edge)
                           dodane_pary.remove(aaa_binary)
                           break
                if len(dodane_pary)==count: 
                   break 
            h=h+1

       count +=1
       bar.update(y)

g.remove_nodes_from(nx.isolates(g))

pos=nx.circular_layout(g)
plt.figure(3,figsize=(8,8))
nx.draw_networkx_edges(g,pos)
nx.draw(g,pos)   
nx.draw_networkx_labels(g,pos)

print (dodane_pary)

plt.show()

Output paths representing 4-bit binary strings from dodane_pary:

[[0, 2, 4, 6], [0, 2, 4, 1], [0, 2, 3, 8], [0, 2, 3, 5], [2, 3, 8, 7], [6, 1, 3, 8], [2, 3, 5, 9], [11, 0, 2, 3], [11, 4, 1, 5], [7, 1, 5, 9]]

So these are representations of 4-bit binary strings:

[0000, 0001, 0010, 0011, 0101, 0110, 0111, 1001, 1011, 1111] 

Of course, as you can see, there are no reflections of the mirrored strings, because there is no such need in an undirected graph.

Graph:

enter image description here

The time the code works is the biggest problem. Because in this quite simple example at the end of the algorithm’s operation, the dobre2 list has 12 values:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], from which the tested there are all four-element sub-lists. However, for example, I would like to build a graph with all representations of 8-bit strings. It’s easy to imagine what size the dobre2 list will grow to at some stage.

And unfortunately I do not see any other way to check step-by-step, because I have not found any mathematical theory matching my problem. For example, the Hamilton cube is built a little differently.

Can multiprocessing be used in the code constructed in this way? I ask because I’ve tried everything but to no avail.

While trying to speed up the operation of the code, I also tried to reject the combinations written to an additional list, but unfortunately such a list quickly grows to very large sizes and placing a condition checking whether a given combination exists on the list, which further extends the algorithm’s working time.

Or maybe you can somehow optimize the code? I will be grateful for every clue.

Ps. I’ve tried everything to implement multiprocessing. Help me please 🙂


Get this bounty!!!

#StackBounty: #google-chrome #performance #memory-usage #gnome-keyring gnome-keyring-daemon memory problem

Bounty: 50

gnome-keyring-daemon uses enormous quantities of memory when used with google chrome (seen on 14.0,4, 16.04 and 17.x, grows up to 2 Gb), hogging down whole system.

It seems that the problem is around quite a long, few examples here:

https://bugs.launchpad.net/ubuntu/+source/gnome-keyring/+bug/1094496

https://bugzilla.redhat.com/show_bug.cgi?id=1485439

https://ubuntuforums.org/showthread.php?t=2266820

etc.

Is there any possibility to correct this behavior or to replace gnome-keyring with anything less insane ?


Get this bounty!!!

#StackBounty: #performance #assembly #optimization #x86 #micro-optimization What methods can be used to efficiently extend instruction …

Bounty: 100

Imagine you want to align a series of x86 assembly instructions to certain boundaries. For example, you may want to align loops to a 16 or 32-byte boundary, or pack instructions so they are efficiently placed in the uop cache or whatever.

The simplest way to achieve this is single-byte NOP instructions, followed closely by multi-byte NOPs. Although the latter is generally more efficient, neither method is free: NOPs use front-end execution resources, and also count against your 4-wide1 rename limit on modern x86.

Another option is to somehow lengthen some instructions to get the alignment you want. If this is done without introducing new stalls, it seems better than the NOP approach. How can instructions be efficiently made longer on recent x86 CPUs?

In the ideal world lengthening techniques would simultaneously be:

  • Applicable to most instructions
  • Capable of lengthening the instruction by a variable amount
  • Not stall or otherwise slow down the decoders
  • Be efficiently represented in the uop cache

It isn’t likely that there is a single method that satisfies all of the above points simultaneously, so good answers will probably address various tradeoffs.


1The limit is 5 or 6 on AMD Ryzen.


Get this bounty!!!