#StackBounty: #python #performance #neural-network Deep Neural Network in Python

Bounty: 100

I have written a neural network in Python and focused on adaptability and performance. I want to use it to dive deeper into that field. I am far from being an expert in neural networks and the same goes for Python. I do not want to use Tensorflow since I really want to understand how a neural network works.

My questions are:

  • How can I increase the performance? At the moment it takes days to train the network

The code runs on a single core. But since every loop over the batches run independently it can be parallelized.

  • How can I parallelize the loop over the batches?

I found some tutorials on parallel loops in Python but I could not apply it to my problem.

Here is my tested code with some pseudo training data:

from numpy import random, zeros, array, dot
from scipy.special import expit
import time 

def sigma(x):
    return expit(x)

def sigma_prime(x):
    u = expit(x)
    return  u-u*u 

def SGD(I, L, batch_size, eta):

    images = len(L)

    # Pre-activation
    z = [zeros((layer_size[l],1)) for l in range(1,nn_size)]

    # Activations
    a = [zeros((layer_size[l],1)) for l in range(nn_size)]

    # Ground truth      
    y = zeros((images, layer_size[-1]))
    for i in range(images):
        y[i,L[i]] = 1.0

    while (1):

        t0 = time.time()

        # Create random batch
        batch = random.randint(0,images,batch_size)

        dW = [zeros((layer_size[l+1], layer_size[l])) for l in range(nn_size-1)]
        db = [zeros((layer_size[l],1)) for l in range(1, nn_size)]

        for i in batch:        
            # Feedforward
            a[0] = array([I[i]]).T
            for l in range(nn_size-1):
                z[l] = dot(W[l], a[l]) + b[l]
                a[l+1] = sigma(z[l])

            # Backpropagation
            delta = (a[nn_size-1]-array([y[i]]).T) * sigma_prime(z[nn_size-2])
            dW[nn_size-2] += dot(delta, a[nn_size-2].T)
            dW[nn_size-2] += delta.dot(a[nn_size-2].T)
            db[nn_size-2] += delta
            for l in reversed(range(nn_size-2)):
                delta = dot(W[l+1].T, delta) * sigma_prime(z[l])
                dW[l] += dot(delta, a[l].T)
                db[l] += delta

        # Update Weights and Biases
        for l in range(nn_size-1):
            W[l] += - eta * dW[l] / batch_size
            b[l] += - eta * db[l] / batch_size

        print(time.time() - t0)

input_size = 1000
output_size = 10

layer_size = [input_size, 30**2, 30**2, 30**2, output_size]

nn_size = len(layer_size)
layer_size = layer_size

# Weights
W = [random.randn(layer_size[l+1],layer_size[l]) for l in range(nn_size-1)]

# Bias
b = [random.randn(layer_size[l],1) for l in range(1,nn_size)]

# Some random training data with label
size_training_data = 1000
I = random.rand(size_training_data, input_size)
L = random.randint(0,10, input_size)

batch_size = 100
eta = 0.1
SGD(I, L, batch_size, eta)


Get this bounty!!!

#StackBounty: #sql-server #performance #index #index-tuning #deadlock SQL deadlock on nonclustered key caused by two INSERTs and a CHEC…

Bounty: 50

Been struggling with deadlocking on a table during INSERTs. It’s a multi-tenant database and Read Committed Snapshot Isolation (RCSI) is enabled.

Dedlock graph

There is a CHECK CONSTRAINT upon INSERT to enforce logic around double bookings which executes a Scalar Valued Function and checks for a result of 0. This constraint and looks up the same table with a READCOMMITTEDLOCK hint to check for violations of the logic where the ID (PK/clustered index) doesn’t equal the ID of the newly inserted row.

The constraint does an INDEX SEEK on the index causing the deadlock: idx_report_foobar.

Any assistance would be greatly appreciated.

Here is the XML (which has been adjusted to remove some of the logic and names of table fields which are in the database):

<deadlock>
 <victim-list>
  <victimProcess id="process91591c108" />
 </victim-list>
 <process-list>
  <process id="process91591c108" taskpriority="0" logused="1328" waitresource="KEY: 9:72057594095861760 (c2e966d5eb6a)" waittime="3046" ownerId="2628292921" transactionname="user_transaction" lasttranstarted="2018-03-09T14:24:13.820" XDES="0x708a80d80" lockMode="S" schedulerid="10" kpid="8964" status="suspended" spid="119" sbid="2" ecid="0" priority="0" trancount="2" lastbatchstarted="2018-03-09T14:24:13.823" lastbatchcompleted="2018-03-09T14:24:13.820" lastattention="1900-01-01T00:00:00.820" clientapp=".Net SqlClient Data Provider" hostname="SERVERNAMEHERE" hostpid="33672" loginname="DOMAINUSERHERE" isolationlevel="read committed (2)" xactid="2628292921" currentdb="9" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
   <executionStack>
    <frame procname="mydb.dbo.CheckForDoubleBookings" line="12" stmtstart="920" stmtend="3200" sqlhandle="0x0300090018ef9b72531bea009ea8000000000000000000000000000000000000000000000000000000000000">
IF EXISTS (SELECT * 
                 FROM   dbo.bookings a WITH (READCOMMITTEDLOCK)
                 WHERE  a.id &lt;&gt; @id 
                        AND a.userID = @userID 
                        AND @bookingStart &lt; a.bookingEnd 
                        AND a.bookingStart &lt; @bookingEnd
                        AND a.eventID = @eventID
    </frame>
    <frame procname="adhoc" line="1" stmtstart="288" stmtend="922" sqlhandle="0x020000005ed9af11c02db2af69df1d5fb6d1adb0e4812afb0000000000000000000000000000000000000000">
unknown    </frame>
    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown    </frame>
   </executionStack>
   <inputbuf>
(@0 datetime2(7),@1 datetime2(7),@2 int,@3 int,@4 int,@5 int,@6 int,@7 nvarchar(4000),@8 datetime2(7),@9 nvarchar(50),@10 int,@11 nvarchar(255))INSERT [dbo].[bookings]([bookingStart], [bookingEnd], [userID], [eventID], [TypeId], [Notes], [Timestamp], [AddedById])
VALUES (@0, @1, @2, @3, @4, @5, @6, @7, @8, NULL, @9, @10, @11, NULL, NULL)
SELECT [Id]
FROM [dbo].[bookings]
WHERE @@ROWCOUNT &gt; 0 AND [Id] = scope_identity()   </inputbuf>
  </process>
  <process id="processca27768c8" taskpriority="0" logused="1328" waitresource="KEY: 9:72057594095861760 (3ba50d420e66)" waittime="3048" ownerId="2628280537" transactionname="user_transaction" lasttranstarted="2018-03-09T14:24:04.063" XDES="0xa555403b0" lockMode="S" schedulerid="6" kpid="12776" status="suspended" spid="124" sbid="2" ecid="0" priority="0" trancount="2" lastbatchstarted="2018-03-09T14:24:04.070" lastbatchcompleted="2018-03-09T14:24:04.063" lastattention="1900-01-01T00:00:00.063" clientapp=".Net SqlClient Data Provider" hostname="SERVERNAMEHERE" hostpid="33672" loginname="DOMAINUSERHERE" isolationlevel="read committed (2)" xactid="2628280537" currentdb="9" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
   <executionStack>
    <frame procname="mydb.dbo.CheckForDoubleBookings" line="12" stmtstart="920" stmtend="3200" sqlhandle="0x0300090018ef9b72531bea009ea8000000000000000000000000000000000000000000000000000000000000">
IF EXISTS (SELECT * 
                 FROM   dbo.bookings a WITH (READCOMMITTEDLOCK)
                 WHERE  a.id &lt;&gt; @id 
                        AND a.userID = @userID 
                        AND @bookingStart &lt; a.bookingEnd 
                        AND a.bookingStart &lt; @bookingEnd
                        AND a.eventID = @eventID
    </frame>
    <frame procname="adhoc" line="1" stmtstart="288" stmtend="922" sqlhandle="0x020000005ed9af11c02db2af69df1d5fb6d1adb0e4812afb0000000000000000000000000000000000000000">
unknown    </frame>
    <frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown    </frame>
   </executionStack>
   <inputbuf>
(@0 datetime2(7),@1 datetime2(7),@2 int,@3 int,@4 int,@5 int,@6 int,@7 nvarchar(4000),@8 datetime2(7),@9 nvarchar(50),@10 int,@11 nvarchar(255))INSERT [dbo].[bookings]([bookingStart], [bookingEnd], [userID], [eventID], [TypeId], [Notes], [Timestamp], [AddedById])
VALUES (@0, @1, @2, @3, @4, @5, @6, @7, @8, NULL, @9, @10, @11, NULL, NULL)
SELECT [Id]
FROM [dbo].[bookings]
WHERE @@ROWCOUNT &gt; 0 AND [Id] = scope_identity()   </inputbuf>
  </process>
 </process-list>
 <resource-list>
  <keylock hobtid="72057594095861760" dbid="9" objectname="mydb.dbo.bookings" indexname="idx_report_foobar" id="locke83fdbe80" mode="X" associatedObjectId="72057594095861760">
   <owner-list>
    <owner id="processca27768c8" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="process91591c108" mode="S" requestType="wait" />
   </waiter-list>
  </keylock>
  <keylock hobtid="72057594095861760" dbid="9" objectname="mydb.dbo.bookings" indexname="idx_report_foobar" id="lock7fdb48480" mode="X" associatedObjectId="72057594095861760">
   <owner-list>
    <owner id="process91591c108" mode="X" />
   </owner-list>
   <waiter-list>
    <waiter id="processca27768c8" mode="S" requestType="wait" />
   </waiter-list>
  </keylock>
 </resource-list>
</deadlock>

The index:

CREATE NONCLUSTERED INDEX [idx_report_foobar] ON [dbo].[bookings]
(
    [eventID] ASC
)
INCLUDE (   [bookingStart],
    [bookingEnd],
    [userID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80)
GO


Get this bounty!!!

#StackBounty: #python #performance #python-3.x Brute-force Hash Cracker

Bounty: 50

I made a hash cracker in Python (for purely educational purposes), but it’s really slow (~120 seconds for a 4 character string). How could I speed it up?

Current optimizations and explanations:

  • Closures in CharSet.get_advance: These are faster than attribute lookups.
  • iter in PasswordCracker.crack: This moves the loop into C.
  • CharSet.next as an array.array: Faster than a dict.

Possible future optimizations:

  • advance is kind of slow, but I’m not sure how to speed it up.

Code:

import hashlib
from string import printable
from time import time
import itertools
from array import array

ENCODING = "ascii" # utf-8 for unicode support

class CharSet():
  def __init__(self, chars):
    chars = to_bytes(chars)
    self.chars = set(chars)
    self.first = chars[0]
    self.last = chars[-1]
    self.next = array("B", [0] * 256)
    for char, next_char in zip(chars, chars[1:]):
      self.next[char] = next_char

  def update_chars(self, new_chars):
    new_chars = to_bytes(new_chars)
    new_chars = set(new_chars) - self.chars
    if new_chars: # if theres anything new
      self.chars |= new_chars
      new_chars = list(new_chars)
      self.next[self.last] = new_chars[0]
      self.last = new_chars[-1]
      for char, next_char in zip(new_chars, new_chars[1:]):
        self.next[char] = next_char

  def get_advance(self, arr, hash_):
    first = self.first
    last = self.last
    next_ = self.next
    def advance():
      for ind, byte in enumerate(arr):
        if byte == last:
          arr[ind] = first
        else:
          arr[ind] = next_[byte]
          return hash_(arr)

      arr.append(first)
      return hash_(arr)

    return advance

class PasswordCracker():
  def __init__(self, hash_, chars=None):
    self.hash = hash_
    if chars is None:
      chars = printable
    self.char_set = CharSet(chars)

  def update_chars(self, string):
    self.char_set.update_chars(string)

  def crack(self, hashed):
    arr = bytearray()
    advance = self.char_set.get_advance(arr, self.hash)
    for _ in iter(advance, hashed):
      pass
    return arr

def to_bytes(string):
  if isinstance(string, str):
    return bytearray(string, ENCODING)
  elif isinstance(string, (bytes, bytearray)):
    return string
  else:
    raise TypeError(f"Cannot convert {string} to bytes")

def get_hasher(hash_):
  def hasher(bytes):
    return hash_(bytes).digest()

  return hasher

md5 = get_hasher(hashlib.md5)

cracker = PasswordCracker(md5)

password = input("Enter password: ")

cracker.update_chars(password)
password = md5(to_bytes(password))

start = time()
cracked = cracker.crack(password)
end = time()
print(f"Password cracked: {cracked.decode(ENCODING)}")
print(f"Time: {end - start} seconds.")

Profiling results (with password "pww"):

      1333313 function calls in 1.500 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.500    1.500 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 main.py:31(get_advance)
   333326    0.394    0.000    1.376    0.000 main.py:35(advance)
        1    0.124    0.124    1.500    1.500 main.py:58(crack)
   333326    0.311    0.000    0.982    0.000 main.py:74(hasher)
   333326    0.265    0.000    0.265    0.000 {built-in method _hashlib.openssl_md5}
        1    0.000    0.000    1.500    1.500 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        3    0.000    0.000    0.000    0.000 {method 'append' of 'bytearray' objects}
   333326    0.405    0.000    0.405    0.000 {method 'digest' of '_hashlib.HASH' objects}

Profiling results (with password "pwww", extra "w"):

         133333314 function calls in 190.800 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  190.799  190.799 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 main.py:31(get_advance)
 33333326   65.652    0.000  169.782    0.000 main.py:35(advance)
        1   21.017   21.017  190.799  190.799 main.py:58(crack)
 33333326   40.640    0.000  104.130    0.000 main.py:74(hasher)
 33333326   27.957    0.000   27.957    0.000 {built-in method _hashlib.openssl_md5}
        1    0.000    0.000  190.800  190.800 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        4    0.000    0.000    0.000    0.000 {method 'append' of 'bytearray' objects}
 33333326   35.533    0.000   35.533    0.000 {method 'digest' of '_hashlib.HASH' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Get this bounty!!!

#StackBounty: #python #performance Brute-force Hash Cracker

Bounty: 50

I made a hash cracker in Python (for purely educational purposes), but it’s really slow (~120 seconds for a 4 character string). How could I speed it up?

Current optimizations and explanations:

  • Closures in CharSet.get_advance: These are faster than attribute lookups.
  • iter in PasswordCracker.crack: This moves the loop into C.
  • CharSet.next as an array.array: Faster than a dict.

Possible future optimizations:

  • advance is kind of slow, but I’m not sure how to speed it up.

Code:

import hashlib
from string import printable
from time import time
import itertools
from array import array

ENCODING = "ascii" # utf-8 for unicode support

class CharSet():
  def __init__(self, chars):
    chars = to_bytes(chars)
    self.chars = set(chars)
    self.first = chars[0]
    self.last = chars[-1]
    self.next = array("B", [0] * 256)
    for char, next_char in zip(chars, chars[1:]):
      self.next[char] = next_char

  def update_chars(self, new_chars):
    new_chars = to_bytes(new_chars)
    new_chars = set(new_chars) - self.chars
    if new_chars: # if theres anything new
      self.chars |= new_chars
      new_chars = list(new_chars)
      self.next[self.last] = new_chars[0]
      self.last = new_chars[-1]
      for char, next_char in zip(new_chars, new_chars[1:]):
        self.next[char] = next_char

  def get_advance(self, arr, hash_):
    first = self.first
    last = self.last
    next_ = self.next
    def advance():
      for ind, byte in enumerate(arr):
        if byte == last:
          arr[ind] = first
        else:
          arr[ind] = next_[byte]
          return hash_(arr)

      arr.append(first)
      return hash_(arr)

    return advance

class PasswordCracker():
  def __init__(self, hash_, chars=None):
    self.hash = hash_
    if chars is None:
      chars = printable
    self.char_set = CharSet(chars)

  def update_chars(self, string):
    self.char_set.update_chars(string)

  def crack(self, hashed):
    arr = bytearray()
    advance = self.char_set.get_advance(arr, self.hash)
    for _ in iter(advance, hashed):
      pass
    return arr

def to_bytes(string):
  if isinstance(string, str):
    return bytearray(string, ENCODING)
  elif isinstance(string, (bytes, bytearray)):
    return string
  else:
    raise TypeError(f"Cannot convert {string} to bytes")

def get_hasher(hash_):
  def hasher(bytes):
    return hash_(bytes).digest()

  return hasher

md5 = get_hasher(hashlib.md5)

cracker = PasswordCracker(md5)

password = input("Enter password: ")

cracker.update_chars(password)
password = md5(to_bytes(password))

start = time()
cracked = cracker.crack(password)
end = time()
print(f"Password cracked: {cracked.decode(ENCODING)}")
print(f"Time: {end - start} seconds.")

Profiling results (with password "pww"):

      1333313 function calls in 1.500 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.500    1.500 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 main.py:31(get_advance)
   333326    0.394    0.000    1.376    0.000 main.py:35(advance)
        1    0.124    0.124    1.500    1.500 main.py:58(crack)
   333326    0.311    0.000    0.982    0.000 main.py:74(hasher)
   333326    0.265    0.000    0.265    0.000 {built-in method _hashlib.openssl_md5}
        1    0.000    0.000    1.500    1.500 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        3    0.000    0.000    0.000    0.000 {method 'append' of 'bytearray' objects}
   333326    0.405    0.000    0.405    0.000 {method 'digest' of '_hashlib.HASH' objects}

Profiling results (with password "pwww", extra "w"):

         133333314 function calls in 190.800 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  190.799  190.799 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 main.py:31(get_advance)
 33333326   65.652    0.000  169.782    0.000 main.py:35(advance)
        1   21.017   21.017  190.799  190.799 main.py:58(crack)
 33333326   40.640    0.000  104.130    0.000 main.py:74(hasher)
 33333326   27.957    0.000   27.957    0.000 {built-in method _hashlib.openssl_md5}
        1    0.000    0.000  190.800  190.800 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        4    0.000    0.000    0.000    0.000 {method 'append' of 'bytearray' objects}
 33333326   35.533    0.000   35.533    0.000 {method 'digest' of '_hashlib.HASH' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Get this bounty!!!

#StackBounty: #performance #haskell #recursion #optimization #benchmarking Why do these fixpoint cata / ana morphism definitions outper…

Bounty: 100

Consider these definitions from a previous question:

type Algebra f a = f a -> a

cata :: Functor f => Algebra f b -> Fix f -> b
cata alg = alg . fmap (cata alg) . unFix

fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg = fix $ f -> alg . fmap f . unFix

type CoAlgebra f a = a -> f a

ana :: Functor f => CoAlgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

fixana :: Functor f => CoAlgebra f a -> a -> Fix f
fixana coalg = fix $ f -> Fix . fmap f . coalg

I ran some benchmarks and the results are surprising me. criterion reports something like a tenfold speedup, specifically when O2 is enabled. I wonder what causes such massive improvement, and begin to seriously doubt my benchmarking abilities.

This is the exact criterion code I use:

smallWord, largeWord :: Word
smallWord = 2^10
largeWord = 2^20

shortEnv, longEnv :: Fix Maybe
shortEnv = ana coAlg smallWord
longEnv = ana coAlg largeWord

benchCata = nf (cata alg)
benchFixcata = nf (fixcata alg)

benchAna = nf (ana coAlg)
benchFixana = nf (fixana coAlg)

main = defaultMain
    [ bgroup "cata"
        [ bgroup "short input"
            [ env (return shortEnv) $ x -> bench "cata"    (benchCata x)
            , env (return shortEnv) $ x -> bench "fixcata" (benchFixcata x)
            ]
        , bgroup "long input"
            [ env (return longEnv) $ x -> bench "cata"    (benchCata x)
            , env (return longEnv) $ x -> bench "fixcata" (benchFixcata x)
            ]
        ]
    , bgroup "ana"
        [ bgroup "small word"
            [ bench "ana" $ benchAna smallWord
            , bench "fixana" $ benchFixana smallWord
            ]
        , bgroup "large word"
            [ bench "ana" $ benchAna largeWord
            , bench "fixana" $ benchFixana largeWord
            ]
        ]
    ]

And some auxiliary code:

alg :: Algebra Maybe Word
alg Nothing = 0
alg (Just x) = succ x

coAlg :: CoAlgebra Maybe Word
coAlg 0 = Nothing
coAlg x = Just (pred x)

Compiled with O0, the digits are pretty even. With O2, fix~ functions seem to outperform the plain ones:

benchmarking cata/short input/cata
time                 31.67 μs   (31.10 μs .. 32.26 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 31.20 μs   (31.05 μs .. 31.46 μs)
std dev              633.9 ns   (385.3 ns .. 1.029 μs)
variance introduced by outliers: 18% (moderately inflated)

benchmarking cata/short input/fixcata
time                 2.422 μs   (2.407 μs .. 2.440 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.399 μs   (2.388 μs .. 2.410 μs)
std dev              37.12 ns   (31.44 ns .. 47.06 ns)
variance introduced by outliers: 14% (moderately inflated)

I would appreciate if someone can confirm or spot a flaw.

*I compiled things with ghc 8.2.2 on this occasion.)


postscriptum

This post from back in 2012 elaborates on the performance of fix in quite a fine detail. (Thanks to @chi for the link.)


Get this bounty!!!

#StackBounty: #machine-learning #predictive-models #performance #sensitivity #specificity Assessing correlated predictions

Bounty: 50

Let’s assume we have a prediction algorithm (if it helps, imagine it’s using some boosted tree method) that does daily predictions for whether some event will happen to a unit (e.g. a machine that might break down, a patient that might get a problematic medical event etc.) during the next week. We can only get training and test data for a low number of units (e.g. 400 + 100 or so) across a limited period of time (e.g. half a year).

How would one assess prediction performance (e.g. sensitivity/specificity/AuROC etc.) of some algorithm (e.g. some tree method) in this setting on test data? Presumably there is a potential issue in that the prediction intervals overlap and even non-overlapping intervals for the same unit are somewhat correlated (i.e. if I can predict well for a particular unit due to its specific characteristics, I may do well on all time intervals for that unit, but this does not mean the algorithm would generalize well).

Perhaps I have just not hit on the right search terms, but I have failed to find anything published on this topic (surely someone has written about this before?!). Any pointers to any literature?

My initial thought was that perhaps naively calculated (i.e. just treating this as independent observations and predictions) point estimates of sensitivity/specificity might be fine, but that any problem would be more with the uncertainty around these? If so, could one just bootstrap (drawing whole units with replacement) and get decent assessments that way?


Get this bounty!!!

#StackBounty: #python #performance #beginner #numpy #markov-chain Solve the phase state between two haplotype blocks using markov trans…

Bounty: 50

I have spent about more than a year in python, but I come from biology background. I should say I have got some understanding of for-loop and nested-for-loop to workout the solution to the problem I have had. But, I suppose that there are better and comprehensive way of solving problem.

Below is the code I wrote to solve the haplotype phase state between two haplotype blocks.

Here is a short snipped of my data:

chr  pos       ms01e_PI  ms01e_PG_al  ms02g_PI  ms02g_PG_al  ms03g_PI  ms03g_PG_al  ms04h_PI  ms04h_PG_al  ms05h_PI  ms05h_PG_al  ms06h_PI  ms06h_PG_al
2    15881764  4         C|T          6         C|T          7         T|T          7         T|T          7         C|T          7         C|T
2    15881767  4         C|C          6         T|C          7         C|C          7         C|C          7         T|C          7         C|C
2    15881989  4         C|C          6         A|C          7         C|C          7         C|C          7         A|T          7         A|C
2    15882091  4         G|T          6         G|T          7         T|A          7         A|A          7         A|T          7         A|C
2    15882451  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|A
2    15882454  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|T
2    15882493  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|T
2    15882505  4         A|T          4         T|A          7         T|T          7         T|T          7         A|C          7         A|T

Problem:

In the given data I want to solve the phase state of haplotype for the sample ms02g. The suffix PI represents the phased block (i.e all data within that block is phased).

For sample ms02g C-T-A-G is phased (PI level 6), so is T-C-C-T. Similarly data within PI level 4 is also phased. But, since the haplotype is broken in two levels, we don’t know which phase from level-6 goes with which phase of level-4. Phase to be solved - sample ms02g

ms02g_PI  ms02g_PG_al
6         C|T
6         T|C
6         A|C
6         G|T
×——————————×—————> Break Point
4         T|C
4         T|C
4         T|C
4         T|A

Since, all other samples are completely phased I can run a markov-chain transition probabilities to solve the phase state. To the human eye/mind you can clearly see and say that left part of ms02g (level 6, i.e C-T-A-G is more likely to go with right block of level 4 C-C-C-A).

Calculation:

Step 01:

I prepare required haplotype configuration:

  • The top haplotype-PI is block01 and bottom is block-02.
  • The left phased haplotype within each block is Hap-A and the right is Hap-B.

So, there are 2 haplotype configurations:

  • Block01-HapA with Block02-HapA, so B01-HapB with B02-HapB
  • Block01-HapA with Block02-HapB, so B01-HapB with B02-HapA

I count the number of transition for each nucleotide of PI-6 to each nucleotide of PI-4 for each haplotype configuration.
possible transition from PI-6 to PI-4

  Possible
  transitions    ms02g_PG_al
    │       ┌┬┬┬ C|T
    │       ││││ T|C
    │       ││││ A|C
    │       ││││ G|T
    └─────> ││││  ×—————> Break Point
            │││└ T|C
            ││└─ T|C
            │└── T|C
            └─── T|A
  • and multiply the transition probabilities for first nucleotide in PI-6 to all nucleotides of PI-4. Then similarly multiply the transition probability for 2nd nucleotide of PI-6 to all nucleotides in PI-4.

  • then I sum the transition probabilities from one level to another.

enter image description here

  Possible
  transitions    ms02g_PG_al
    │       ┌┬┬┬ C|T 
    │       ││││ T|C | Block 1
    │       ││││ A|C |
    │       ││││ G|T /
    └─────> ││││  ×—————> Break Point
            │││└ T|C 
            ││└─ T|C |
            │└── T|C | Block 2
            └─── T|A /
                 ↓ ↓
             Hap A Hap B

calculating likelyhood

  Block-1-Hap-A with Block-2-Hap-B    vs.    Block-1-Hap-A with Block-2-Hap-A

  C|C × C|C × C|C × A|C = 0.58612            T|C × T|C × T|C × T|C = 0.000244
+ C|T × C|T × C|T × A|T = 0.3164           + T|T × T|T × T|T × T|T = 0.00391
+ C|A × C|A × C|A × A|A = 0.482            + T|A × T|A × T|A × T|A = 0.0007716
+ C|G × C|G × C|G × A|G = 0.3164           + T|G × T|G × T|G × T|G = 0.00391
                          ———————                                    —————————
                    Avg = 0.42523                              Avg = 0.002207

        Likelyhood Ratio = 0.42523 / 0.002207 = 192.67

I have the following working code with python3:
I think the transition probs can be mostly optimized using numpy, or using a markov-chain package. But, I was not able to work them out in the way I desired. Also, it took me sometime to get to what I wanted to do.

Any suggestions with explanation. Thanks.

#!/home/bin/python
from __future__ import division

import collections
from itertools import product
from itertools import islice
import itertools
import csv
from pprint import pprint


''' function to count the number of transitions '''
def sum_Probs(pX_Y, pX):
    try:
        return float(pX_Y / pX)
    except ZeroDivisionError:
        return 0


with open("HaploBlock_Updated.txt", 'w') as f:

    ''' Write header (part 01): This is just a front part of the header. The rear part of the header is
    update dynamically depending upon the number of samples '''
    f.write('t'.join(['chr', 'pos', 'ms02g_PI', 'ms02g_PG_al', 'n']))

    with open('HaploBlock_for_test-forHMMtoy02.txt') as Phased_Data:

        ''' Note: Create a dictionary to store required data. The Dict should contain information about
         two adjacent haplotype blocks that needs extending. In this example I want to extend the
         haplotypes for sample ms02g which has two blocks 6 and 4. So, I read the PI and PG value
         for this sample. Also, data should store with some unique keys. Some keys are: chr, pos,
         sample (PI, PG within sample), possible haplotypes ... etc .. '''


        Phased_Dict = csv.DictReader(Phased_Data, delimiter='t')
        grouped = itertools.groupby(Phased_Dict, key=lambda x: x['ms02g_PI'])


        ''' Function to read the data as blocks (based on PI values of ms02g) at a time'''
        def accumulate(data):
            acc = collections.OrderedDict()
            for d in data:
                for k, v in d.items():
                    acc.setdefault(k, []).append(v)
            return acc


        ''' Store data as keys,values '''
        grouped_data = collections.OrderedDict()
        for k, g in grouped:
            grouped_data[k] = accumulate(g)


        ''' Clear memory '''
        del Phased_Dict
        del grouped

        ''' Iterate over two Hap Blocks at once. This is done to obtain all possible Haplotype
        configurations between two blocks. The (keys,values) for first block is represented as
        k1,v2 and for the later block as k2,v2. '''

        # Empty list for storing haplotypes from each block
        hap_Block1_A = [];
        hap_Block1_B = []
        hap_Block2_A = [];
        hap_Block2_B = []

        # Create empty list for possible haplotype configurations from above block
        hapB1A_hapB2A = []
        hapB1B_hapB2B = []

        ''' list of all available samples (index, value) '''
        sample_list = [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'),
                       ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al'),
                       ('ms05h_PI', 'ms05h_PG_al'), ('ms06h_PI', 'ms06h_PG_al')]


        ''' read data from two consecutive blocks at a time '''
        for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):
            ''' skip if one of the keys has no values'''
            if k1 == '.' or k2 == '.':
                continue


            ''' iterate over the first Haplotype Block, i.e the k1 block.
            The nucleotides in the left of the phased SNPs are called Block01-haplotype-A,
            and similarly on the right as Block01-haplotype-B. '''
            hap_Block1_A = [x.split('|')[0] for x in v1['ms02g_PG_al']] # the left haplotype of Block01
            hap_Block1_B = [x.split('|')[1] for x in v1['ms02g_PG_al']]


            ''' iterate over the second Haplotype Block,
            i.e the k2 block '''
            hap_Block2_A = [x.split('|')[0] for x in v2['ms02g_PG_al']]
            hap_Block2_B = [x.split('|')[1] for x in v2['ms02g_PG_al']]



            ''' Now, we have to start to solve the possible haplotype configuration.
            Possible haplotype Configurations will be, Either :
            1) Block01-haplotype-A phased with Block02-haplotype-A,
                creating -> hapB1A-hapB2A, hapB1B-hapB2B
            Or,
            2) Block01-haplotype-A phased with Block02-haplotype-B
                creating -> hapB1A-hapB2B, hapB1B-hapB2A '''

            ''' First possible configuration '''
            hapB1A_hapB2A = [hap_Block1_A, hap_Block2_A]
            hapB1B_hapB2B = [hap_Block1_B, hap_Block2_B]

            ''' Second Possible Configuration '''
            hapB1A_hapB2B = [hap_Block1_A, hap_Block2_B]
            hapB1B_hapB2A = [hap_Block1_B, hap_Block2_A]


            print('nStarting MarkovChains')
            ''' Now, start preping the first order markov transition matrix
             for the observed haplotypes between two blocks.'''


            ''' To store the sum-values of the product of the transition probabilities.
            These sum are added as the product-of-transition is retured by nested for-loop;
            from "for m in range(....)" '''

            Sum_Of_Product_of_Transition_probabilities_hapB1A_hapB2A = 
                sumOf_PT_hapB1A_B2A = 0
            Sum_Of_Product_of_Transition_probabilities_hapB1B_hapB2B = 
                sumOf_PT_hapB1B_B2B = 0

            Sum_Of_Product_of_Transition_probabilities_hapB1A_hapB2B = 
                sumOf_PT_hapB1A_B2B = 0
            Sum_Of_Product_of_Transition_probabilities_hapB1B_hapB2A = 
                sumOf_PT_hapB1B_B2A = 0


            for n in range(len(v1['ms02g_PI'])):  # n-ranges from block01

                ''' Set the probabilities of each nucleotides at Zero. They are updated for each level
                 of "n" after reading the number of each nucleotides at that position. '''
                pA = 0; pT = 0; pG = 0; pC = 0
                # nucleotide_prob = [0., 0., 0., 0.] # or store as numpy matrix

                # Also storing these values as Dict
                # probably can be improved
                nucleotide_prob_dict = {'A': 0, 'T': 0, 'G': 0, 'C': 0}
                print('prob as Dict: ', nucleotide_prob_dict)


                ''' for storing the product-values of the transition probabilities.
                These are updated for each level of "n" paired with each level of "m". '''
                product_of_transition_Probs_hapB1AB2A = POTP_hapB1AB2A = 1
                product_of_transition_Probs_hapB1BB2B = POTP_hapB1BB2B = 1

                product_of_transition_Probs_hapB1AB2B = POTP_hapB1AB2B = 1
                product_of_transition_Probs_hapB1BB2A = POTP_hapB1BB2A = 1



                ''' Now, we read each level of "m" to compute the transition from each level of "n"
                to each level of "m". '''
                for m in range(len(v2['ms02g_PI'])): # m-ranges from block02
                    # transition for each level of 0-to-n level of V1 to 0-to-m level of V2
                    ''' set transition probabilities at Zero for each possible transition '''
                    pA_A = 0; pA_T = 0; pA_G = 0; pA_C = 0
                    pT_A = 0; pT_T = 0; pT_G = 0; pT_C = 0
                    pG_A = 0; pG_T = 0; pG_G = 0; pG_C = 0
                    pC_A = 0; pC_T = 0; pC_G = 0; pC_C = 0

                    ''' Also, creating an empty dictionary to store transition probabilities
                    - probably upgrade using numpy '''
                    transition_prob_dict = {'A_A' : 0, 'A_T' : 0, 'A_G' : 0, 'A_C' : 0,
                    'T_A' : 0, 'T_T' : 0, 'T_G' : 0, 'T_C' : 0,
                    'G_A' : 0, 'G_T' : 0, 'G_G' : 0, 'G_C' : 0,
                    'C_A' : 0, 'C_T' : 0, 'C_G' : 0, 'C_C' : 0}


                    ''' Now, loop through each sample to compute initial probs and transition probs '''
                    for x, y in sample_list:
                        print('sample x and y:', x,y)


                        ''' Update nucleotide probability for this site
                        - only calculated from v1 and only once for each parse/iteration '''
                        if m == 0:
                            pA += (v1[y][n].count('A'))
                            pT += (v1[y][n].count('T'))
                            pG += (v1[y][n].count('G'))
                            pC += (v1[y][n].count('C'))

                            nucleotide_prob_dict['A'] = pA
                            nucleotide_prob_dict['T'] = pT
                            nucleotide_prob_dict['G'] = pG
                            nucleotide_prob_dict['C'] = pC

                        nucleotide_prob = [pA, pT, pG, pC]


                        ''' Now, update transition matrix '''
                        nucl_B1 = (v1[y][n]).split('|')  # nucleotides at Block01
                        nucl_B2 = (v2[y][m]).split('|')  # nucleotides at Block02


                        ''' create possible haplotype configurations between "n" and "m".
                         If the index (PI value) are same we create zip, if index (PI value) are
                         different we create product. '''

                        HapConfig = []  # create empty list

                        if v1[x][n] == v2[x][m]:
                            ziped_nuclB1B2 = [(x + '_' + y) for (x,y) in zip(nucl_B1, nucl_B2)]
                            HapConfig = ziped_nuclB1B2

                            ''' Move this counting function else where '''
                            pA_A += (HapConfig.count('A_A'))  # (v1[y][0].count('A'))/8
                            pA_T += (HapConfig.count('A_T'))
                            pA_G += (HapConfig.count('A_G'))
                            pA_C += (HapConfig.count('A_C'))

                            pT_A += (HapConfig.count('T_A'))  # (v1[y][0].count('A'))/8
                            pT_T += (HapConfig.count('T_T'))
                            pT_G += (HapConfig.count('T_G'))
                            pT_C += (HapConfig.count('T_C'))

                            pG_A += (HapConfig.count('G_A'))  # (v1[y][0].count('A'))/8
                            pG_T += (HapConfig.count('G_T'))
                            pG_G += (HapConfig.count('G_G'))
                            pG_C += (HapConfig.count('G_C'))

                            pC_A += (HapConfig.count('C_A'))
                            pC_T += (HapConfig.count('C_T'))
                            pC_G += (HapConfig.count('C_G'))
                            pC_C += (HapConfig.count('C_C'))


                        if v1[x][n] != v2[x][m]:
                            prod_nuclB1B2 = [(x + '_' + y) for (x,y) in product(nucl_B1, nucl_B2)]
                            HapConfig = prod_nuclB1B2
                            print('prod NuclB1B2: ', prod_nuclB1B2)

                            pA_A += (HapConfig.count('A_A'))/2
                            pA_T += (HapConfig.count('A_T'))/2
                            pA_G += (HapConfig.count('A_G'))/2
                            pA_C += (HapConfig.count('A_C'))/2

                            pT_A += (HapConfig.count('T_A'))/2
                            pT_T += (HapConfig.count('T_T'))/2
                            pT_G += (HapConfig.count('T_G'))/2
                            pT_C += (HapConfig.count('T_C'))/2

                            pG_A += (HapConfig.count('G_A'))/2
                            pG_T += (HapConfig.count('G_T'))/2
                            pG_G += (HapConfig.count('G_G'))/2
                            pG_C += (HapConfig.count('G_C'))/2

                            pC_A += (HapConfig.count('C_A'))/2
                            pC_T += (HapConfig.count('C_T'))/2
                            pC_G += (HapConfig.count('C_G'))/2
                            pC_C += (HapConfig.count('C_C'))/2


                    ''' Now, compute nucleotide and transition probabilities for each nucleotide
                    from each 0-n to 0-m at each sample. This updates the transition matrix in
                    each loop. **Note: At the end this transition probabilities should sum to 1 '''

                    ''' Storing nucleotide probabilities '''
                    nucleotide_prob = [pA, pT, pG, pC]

                    ''' Storing transition probability as dict'''
                    transition_prob_dict['A_A'] = sum_Probs(pA_A,pA)
                    transition_prob_dict['A_T'] = sum_Probs(pA_T,pA)
                    transition_prob_dict['A_G'] = sum_Probs(pA_G,pA)
                    transition_prob_dict['A_C'] = sum_Probs(pA_C,pA)

                    transition_prob_dict['T_A'] = sum_Probs(pT_A,pT)
                    transition_prob_dict['T_T'] = sum_Probs(pT_T,pT)
                    transition_prob_dict['T_G'] = sum_Probs(pT_G,pT)
                    transition_prob_dict['T_C'] = sum_Probs(pT_C,pT)

                    transition_prob_dict['G_A'] = sum_Probs(pG_A,pG)
                    transition_prob_dict['G_T'] = sum_Probs(pG_T,pG)
                    transition_prob_dict['G_G'] = sum_Probs(pG_G,pG)
                    transition_prob_dict['G_C'] = sum_Probs(pG_C,pG)

                    transition_prob_dict['C_A'] = sum_Probs(pC_A,pC)
                    transition_prob_dict['C_T'] = sum_Probs(pC_T,pC)
                    transition_prob_dict['C_G'] = sum_Probs(pC_G,pC)
                    transition_prob_dict['C_C'] = sum_Probs(pC_C,pC)


                    '''Now, we start solving the haplotype configurations after we have the
                      required data (nucleotide and transition probabilities).
                    Calculate joint probability for each haplotype configuration.
                    Step 01: We calculate the product of the transition from one level
                    of "n" to several levels of "m". '''

                    ''' finding possible configuration between "n" and "m". '''
                    hapB1A_hapB2A_transition = hapB1A_hapB2A[0][n] + '_' + hapB1A_hapB2A[1][m]
                    hapB1B_hapB2B_transition = hapB1B_hapB2B[0][n] + '_' + hapB1B_hapB2B[1][m]

                    hapB1A_hapB2B_transition = hapB1A_hapB2B[0][n] + '_' + hapB1A_hapB2B[1][m]
                    hapB1B_hapB2A_transition = hapB1B_hapB2A[0][n] + '_' + hapB1B_hapB2A[1][m]


                    ''' computing the products of transition probabilities on the for loop '''
                    POTP_hapB1AB2A *= transition_prob_dict[hapB1A_hapB2A_transition]
                    POTP_hapB1BB2B *= transition_prob_dict[hapB1B_hapB2B_transition]

                    POTP_hapB1AB2B *= transition_prob_dict[hapB1A_hapB2B_transition]
                    POTP_hapB1BB2A *= transition_prob_dict[hapB1B_hapB2A_transition]


                ''' Step 02: sum of the product of the transition probabilities '''
                sumOf_PT_hapB1A_B2A += POTP_hapB1AB2A
                sumOf_PT_hapB1B_B2B += POTP_hapB1BB2B

                sumOf_PT_hapB1A_B2B += POTP_hapB1AB2B
                sumOf_PT_hapB1B_B2A += POTP_hapB1BB2A


            ''' Step 03: Now, computing the likely hood of each haplotype configuration '''
            print('computing likelyhood:')

            likelyhood_hapB1A_with_B2A_Vs_B2B = LH_hapB1AwB2AvsB2B = 
                sumOf_PT_hapB1A_B2A / sumOf_PT_hapB1A_B2B
            likelyhood_hapB1B_with_B2B_Vs_B2A = LH_hapB1BwB2BvsB2A = 
                sumOf_PT_hapB1B_B2B / sumOf_PT_hapB1B_B2A

            likelyhood_hapB1A_with_B2B_Vs_B2A = LH_hapB1AwB2BvsB2A = 
                sumOf_PT_hapB1A_B2B / sumOf_PT_hapB1A_B2A
            likelyhood_hapB1B_with_B2A_Vs_B2B = LH_hapB1BwB2AvsB2B = 
                sumOf_PT_hapB1B_B2A / sumOf_PT_hapB1B_B2B


            print('nlikely hood Values: B1A with B2A vs. B2B, B1B with B2B vs. B2A')
            print(LH_hapB1AwB2AvsB2B, LH_hapB1BwB2BvsB2A)

            print('nlikely hood Values: B1A with B2B vs. B2A, B1B with B2A vs. B2B')
            print(LH_hapB1AwB2BvsB2A, LH_hapB1BwB2AvsB2B)


            ''' Update the phase state of the block based on evidence '''
            if (LH_hapB1AwB2AvsB2B + LH_hapB1BwB2BvsB2A) > 
                4*(LH_hapB1AwB2BvsB2A + LH_hapB1BwB2AvsB2B):
                print('Block-1_A is phased with Block-2_A')

                for x in range(len(v1['ms02g_PI'])):
                    PI_value = v1['ms02g_PI'][x]
                    # Note: so, we trasfer the PI value from ealier block to next block

                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                       hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], PI_value,
                                       hapB1A_hapB2A[1][x] + '|' + hapB1B_hapB2B[1][x], 'n']))

            elif (LH_hapB1AwB2AvsB2B + LH_hapB1BwB2BvsB2A) < 
                4 * (LH_hapB1AwB2BvsB2A + LH_hapB1BwB2AvsB2B):
                print('Block-1_A is phased with Block-2_B')

                for x in range(len(v1['ms02g_PI'])):
                    PI_value = v1['ms02g_PI'][x]
                    # Note: so, we trasfer the PI value from ealier block to next block...
                        # but, the phase position are reversed

                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                       hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], PI_value,
                                       hapB1B_hapB2B[1][x] + '|' + hapB1A_hapB2A[1][x], 'n']))


            else:
                print('cannot solve the phase state with confidence - sorry ')
                for x in range(len(v1['ms02g_PI'])):
                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                   hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], v2['ms02g_PI'][x],
                                       hapB1A_hapB2A[1][x] + '|' + hapB1B_hapB2B[1][x], 'n']))


Get this bounty!!!

#StackBounty: #performance #mariadb #threads MariaDB threads optimalization

Bounty: 100

NEW (pcie) server:
Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz, 1TB NVMe disks, 128 GB RAM, installed Debian 4.9.65-3+deb9u1, Ver 15.1 Distrib 10.1.26-MariaDB

moved binary db files from

OLD server:
Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz, SSD disk, 64 GB RAM, FreeBSD 11.0-STABLE, 10.1.21-MariaDB

On servers is running just mysql, I copy my.ini file, config files are same.

Run mysqlslap benchmark (always restarted server before doing each test):

root@db1:/tmp # mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=1 --iterations=1
Benchmark
        Average number of seconds to run all queries: 59.573 seconds
        Minimum number of seconds to run all queries: 59.573 seconds
        Maximum number of seconds to run all queries: 59.573 seconds
        Number of clients running queries: 1
        Average number of queries per client: 100000


root@pcie:~# mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=1 --iterations=1
Benchmark
        Average number of seconds to run all queries: 31.151 seconds
        Minimum number of seconds to run all queries: 31.151 seconds
        Maximum number of seconds to run all queries: 31.151 seconds
        Number of clients running queries: 1
        Average number of queries per client: 100000
====================================================================================================================================
root@db1:/tmp # mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=100 --iterations=1
Benchmark
        Average number of seconds to run all queries: 568.082 seconds
        Minimum number of seconds to run all queries: 568.082 seconds
        Maximum number of seconds to run all queries: 568.082 seconds
        Number of clients running queries: 100
        Average number of queries per client: 100000

root@pcie:/etc/security/limits.d# mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=100 --iterations=1
Benchmark
        Average number of seconds to run all queries: 2059.712 seconds
        Minimum number of seconds to run all queries: 2059.712 seconds
        Maximum number of seconds to run all queries: 2059.712 seconds
        Number of clients running queries: 100
        Average number of queries per client: 100000



====================================================================================================================================
root@db1:/tmp # mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=8 --iterations=1
Benchmark
        Average number of seconds to run all queries: 134.003 seconds
        Minimum number of seconds to run all queries: 134.003 seconds
        Maximum number of seconds to run all queries: 134.003 seconds
        Number of clients running queries: 8
        Average number of queries per client: 100000

root@pcie:/etc/security/limits.d# mysqlslap --user=root --query=/tmp/slap2.sql --create-schema=mydatabase --concurrency=8 --iterations=1
Benchmark
        Average number of seconds to run all queries: 133.410 seconds
        Minimum number of seconds to run all queries: 133.410 seconds
        Maximum number of seconds to run all queries: 133.410 seconds
        Number of clients running queries: 8
        Average number of queries per client: 100000

As you can see, NEW (pcie) server is performing very good when running concurrency=1, performance is same when concurrency=8, and performance is very bad when concurrency=100.

Here are interesting results using internal benchmark:

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=8 --iterations=500 --verbose
        Average number of seconds to run all queries: 0.002 seconds
DB1:    Average number of seconds to run all queries: 0.002 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=16 --iterations=500
        Average number of seconds to run all queries: 0.007 seconds
DB1:    Average number of seconds to run all queries: 0.005 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=32 --iterations=500
        Average number of seconds to run all queries: 0.015 seconds
DB1:    Average number of seconds to run all queries: 0.011 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=64 --iterations=500
        Average number of seconds to run all queries: 0.033 seconds
DB1:    Average number of seconds to run all queries: 0.029 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=128 --iterations=500
        Average number of seconds to run all queries: 0.074 seconds
DB1:    Average number of seconds to run all queries: 0.097 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=256 --iterations=500
        Average number of seconds to run all queries: 0.197 seconds
DB1:    Average number of seconds to run all queries: 0.293 seconds

root@pcie:~/slap/employees_db# mysqlslap --auto-generate-sql --concurrency=512 --iterations=500
        Average number of seconds to run all queries: 0.587 seconds
DB1:    Average number of seconds to run all queries: 1.009 seconds

Anybody have idea what to check, how to optimize the setup? I am using MyISAM tables only, mariadb config is same on both servers…

Update with more info: Initially I installed on NEW DB server FREEBSD, MariaDB performance was bad, I thought it is OS related problem, but same symptoms are on Linux. During benchmark there is basically no IO after filling the cache, so this is not IO related problem.

Thanks for any ideas.


Get this bounty!!!

#StackBounty: #java #performance #algorithm #guava Generating a string in a particular format by reading data from a map

Bounty: 50

I have a processToTaskIdHolder Map which contains processId as the key and taskId as the value. Now I am iterating this map and at the end I am making a String in particular format.

For example:-

  • Let’s say I have 123 as the key and 009 is the value in the processToTaskIdHolder map.
  • Now I will make a “activityKey” using 123 and then get data basis on this key.
  • Now I will iterate all the categories for that activityKey and check whether those categoryId are already present in processToTaskIdHolder keyset or not. If they are present, then I will extract taskId for that categoryId from the map and also extract score for that categoryId and store it in an Info class.
  • Same category can be present with different score for different processId.

Now I need to repeat above steps for each activity I have in activities list.

So my formatted string will be like this:-

A,B,C:Score1,D:Score2
P,Q,R:Score1,S:Score2
  • Where A is the categoryId for the processId C and D, and Score1 is the score for categoryId A for processId C. Score2 is the score for categoryId A but for processId D. We have different scores for same categories for two different processes. It means, categoryId A was present in both processId C and D so I need to get the score for both the cases and make a string like that. And B is the taskId for categoryId A which will be present in the map.
  • Where P is the categoryId for the processId R and S, and Score1 is the score for categoryId P for processId R. Score2 is the score for categoryId P but for processId S. We have different scores for same categories for two different processes. It means, categoryId P was present in both processId R and S so I need to get the score for both the cases and make a string like that. And Q is the taskId for categoryId P which will be present in the map.

I have this code which does the job but I think it’s not the right and efficient way to achieve above formatted string. I believe it can be done in a much better way.

  private static final List<String> activities = Arrays.asList("tree", "gold", "print", "catch");

  public static void reverseLookup(final String clientId, final Map<String, String> processToTaskIdHolder) {
    Multimap<String, Info> reverseLookup = LinkedListMultimap.create();
    for (String activity : activities) {
      for (Entry<String, String> entry : processToTaskIdHolder.entrySet()) {
        String activityKey = "abc_" + activity + "_" + clientId + "_" + entry.getKey();
        Optional<Datum> datum = getData(activityKey);
        if (!datum.isPresent()) {
          continue;
        }
        List<Categories> categories = datum.get().getCategories();
        for (Categories category : categories) {
          String categoryId = String.valueOf(category.getLeafCategId());
          if (processToTaskIdHolder.containsKey(categoryId)) {
            Info info = new Info(entry.getKey(), String.valueOf(category.getScore()));
            reverseLookup.put(categoryId + ":" + processToTaskIdHolder.get(categoryId), info);
          }
        }
      }
      String formattedString = generateString(reverseLookup);
      System.out.println(formattedString);
    }
  }

  private static String generateString(final Multimap<String, Info> reverseLookup) {
    StringBuilder sb = new StringBuilder();
    for (Entry<String, Collection<Info>> entry : reverseLookup.asMap().entrySet()) {
      sb.append(entry.getKey().split(":")[0]).append(",").append(entry.getKey().split(":")[1])
          .append(",");
      String sep = "";
      for (Info info : entry.getValue()) {
        sb.append(sep).append(info.getLeafCategoryId()).append(":").append(info.getScore());
        sep = ",";
      }
      sb.append(System.getProperty("line.separator"));
    }
    return sb.toString();
  }

In the reverseLookup map, I have a key in this format – “a:b”, so I’m not sure instead of making a key like that. Maybe it can be done in some other better way?

Note: I am working with Java 7. Categories and Info class is a simple immutable class with basic toString implementations.


Get this bounty!!!

#StackBounty: #performance #linux-networking #udp Disable NIC Receive Side Scaling hashing

Bounty: 100

On a benchmark lab system running Fedora Core 27 I’ve got Intel X710 10GE cards and 12-core Xeon processors, configured with 12 NIC queues and RX Flow Hashing based on both IP addresses and port numbers.

This results in uneven balancing between cores, and inconsistent performance results. I’ve got irqbalance disabled, with a 1:1 mapping from NIC queue to CPU core configure via /proc.

My application is UDP based so I’m not really bothered about the hashing. I can’t generate enough entropy across the inputs to the hash function to get an even distribution of outputs, so I’d quite like to just try plain round robin instead.

Is there a way to disable RSS for UDP and get round robin whilst still maintaining separate queues? All of the links I’ve found that talk about disabling RSS also seem to disable multiple queues.


Get this bounty!!!