#StackBounty: #private-key #python #bip32 #bip38 #pybitcointools How to convert a WIF private key to a BIP32 extended private key

Bounty: 50

I am trying to recover the extended BIP32 master private key from its extended BIP32 public key (which I already know) and a WIF private key (which I obtained by decrypting a BIP38 seed with a passphrase).

I followed the steps described in this article from Vitalik Buterin, but to use the crack_bip32_privkey function in pybitcointools, I need to have a private key in BIP32 format (not WIF). I can see how to obtain a BIP32 private master key from a BIP32 seed (with bip32_master_key), but not how to do the same from a private key in WIF format.

How can I convert the WIF Private key into a BIP 32 private key (with Python, .NET or Javascript)?


Get this bounty!!!

#StackBounty: #open-source #library #file-management #python #cloud-storage Library for file management. Two backends: filesystem, s3

Bounty: 50

I am dreaming of a Python library which abstracts the file handling of my application.

The application should run in two different configurations:

  1. No storage server. All file operations get done on the local disk.
  2. With storage server. All file operations should get done via s3.

I would like to do separation of concerns.

The application code should not care which configuration gets used. Choosing the right configuration (with or without storage server) gets done via configuration management.

I don’t need all file operations which I can do via os and os.path. I just need all operations which can be done via s3.

Other required features:

  • Open source: BSD or LGPL, not GPL
  • Support for Linux. Other operating systems are not important in this context.

Distinction / out off scope

I don’t want all file operations (like os.walk()). I just need the fundamental storage APIs of s3, but without a running storage server.


Get this bounty!!!

#StackBounty: #python #moviepy Moviepy Transition issues with multiple CompositeVideoClips

Bounty: 200

I’m trying to create a video slideshow with moviepy with a slide up transition between the slides. I can get the transition to work fine if they are just ImageClips, but when I add text to the images using a CompositeVideoClip it stops working. If I have just one CompositeVideoClip and the rest ImageClips it works fine, but once I have more than one CompositeVideoClips it starts to break.

I’m not sure if this is a bug with moviepy, or with the way I have it setup.

Here is my code:

from moviepy.editor import *

H = 720
W = 1280
SIZE = (W, H)
HX = H + H * .10  # increase size 10%
WX = W + W * .10
bold_font = 'Liberation-Sans-Bold'
plain_font = 'Liberation-Sans'

def slide_out(clip, duration, height, counter):
    def calc(t, counter, duration, h):
        ts = t - (counter * duration)
        val = min(-45, h*(duration-ts))
        return ('center', val)
    return clip.set_pos(lambda t: calc(t, counter, duration, height))

def add_transition(clip_size, counter, clip):
    # reverse the count to get slide number.
    counter = clip_size - 1 - counter
    return slide_out(clip.resize(height=HX, width=WX), 3, HX, counter)

img_1 = ImageClip("/pics/1.jpg").set_duration(4).set_start(0).resize(height=H, width=W)   # 3-8
txt_1 = TextClip("title 1", font=bold_font, color='white', fontsize=64, interline=9).set_duration(2).set_start(1).set_pos(('right', 360)).crossfadein(.3)
stxt_1 = TextClip("sub title 1", font=plain_font, color='white', fontsize=80, interline=9).set_duration(1.5).set_start(1.5).set_pos(('right', 440)).crossfadein(.3)
img_2 = ImageClip("/pics/2.jpg").set_duration(4).set_start(3).resize(height=H, width=W)   # 3-8
txt_2 = TextClip("title 2", font=bold_font, color='white', fontsize=64, interline=9).set_duration(3).set_start(3.5).set_pos(('right', 360)).crossfadein(.3)
stxt_2 = TextClip("sub title 2", font=plain_font, color='white', fontsize=80, interline=9).set_duration(2.5).set_start(3.5).set_pos(('right', 440)).crossfadein(.3)

# slides images with text on top.
slide_1 = CompositeVideoClip([img_1, txt_1, stxt_1]) #.set_duration(4)
slide_2 = CompositeVideoClip([img_2, txt_2, stxt_2]) #.set_duration(4)

clips = [slide_2, slide_1] # reverse because we want the first slides on top.

slides = [add_transition(len(clips), x, clip) for x, clip in enumerate(clips)]
final_clip = CompositeVideoClip(slides, size=SIZE).set_duration(8)
final_clip.write_videofile("/pics/vids/video.mp4", fps=24, audio_codec="aac")

I have tried a few different things to see if I can figure out what is going on. Any help that you could give, would be great.

Option 1

If if slide_1 has duration set and slide_2 and final_clip does not have duration set. finished video has duration=4, total duration is 4. It will show the full first slide, and only the first second of the second slide.


slide_1 = CompositeVideoClip([img_1, txt_1, stxt_1]).set_duration(4)
slide_2 = CompositeVideoClip([img_2, txt_2, stxt_2])
final_clip = CompositeVideoClip(slides, size=SIZE)

Result

1

Option 2

if slide_1 and slide_2 don’t set duration, but final_clip does have duration of 8. it is 8 seconds long but the second image only shows up for 1 second (at t=3 to t=4), then disapears and leaves the text.


slide_1 = CompositeVideoClip([img_1, txt_1, stxt_1])
slide_2 = CompositeVideoClip([img_2, txt_2, stxt_2])
final_clip = CompositeVideoClip(slides, size=SIZE).set_duration(8)

Result

2

Option 3.

if all three have durations set. slide_1 works fine, but slide 2 only shows up for 1 second, then goes black for rest of the time.


slide_1 = CompositeVideoClip([img_1, txt_1, stxt_1]).set_duration(4)
slide_2 = CompositeVideoClip([img_2, txt_2, stxt_2]).set_duration(4)
final_clip = CompositeVideoClip(slides, size=SIZE).set_duration(8)

Result

3

Option 4.

if none of them have durations set. same as 2.


slide_1 = CompositeVideoClip([img_1, txt_1, stxt_1])
slide_2 = CompositeVideoClip([img_2, txt_2, stxt_2])
final_clip = CompositeVideoClip(slides, size=SIZE)

Result

2

Option 5.

if slide 2 has duration, but slide_1 and final_clip do not. same as 1.


slide_1 = CompositeVideoClip([img_1, txt_1, stxt_1])
slide_2 = CompositeVideoClip([img_2, txt_2, stxt_2]).set_duration(4)
final_clip = CompositeVideoClip(slides, size=SIZE)

Result

1


Get this bounty!!!

#StackBounty: #windows #mysql #python #openssl Python/MySQL on Windows. Problems with openssl

Bounty: 100

I have a MySQL Server set up to use SSL and I also have the CA Certificate.

When I connect to the server using MySQL Workbench, I do not need the certificate. I can also connect to the server using Python and MySQLdb on a Mac without the CA-certificate.

But when I try to connect using the exact same setup of Python and MySQLdb on a windows machine, I get access denied. It appears that I need the CA. And when I enter the CA, I get the following error

_mysql_exceptions.OperationalError: (2026, 'SSL connection error')

My code to open the connection is below:

db = MySQLdb.connect(host="host.name",    
                 port=3306,
                 user="user",         
                 passwd="secret_password",  
                 db="database", 
                 ssl={'ca': '/path/to/ca/cert'})  

Could anyone point out what the problem is on a windows? I believe it has to do with OpenSSL and Python. I installed OpenSSL from here. But I don’t think Python is using this version that I installed, since when I print the version using Python, it’s not the same.

This is what Python prints. It is still not a very old version and should have worked when connecting to MySQL

OpenSSL 1.0.2j 26 Sep 2016

I am really not used to having to work with OpenSSL and its issues. I’ve literally tried all the solutions found on google by searching the error I get, and you would think one of them should work. But none did and hence I’m guessing the problem is with the OpenSSL and Python on my system. Anyone know how I should try to at least identify the exact problem?

I also do not understand how I can connect to the MySQL Server without the CA Certificate using a Mac/Python or MySQL Workbench, but I get an access denied error in Windows using Python :/

UPDATE:

Python version 2.7.13

MySQL Server Enterprise version 5.7.18


Get this bounty!!!

#StackBounty: #python #mysql #django How to concat two columns of table django model

Bounty: 50

I am implementing search in my project what I want is to concat to column in where clause to get results from table.

Here is what I am doing:

from django.db.models import Q

if 'search[value]' in request.POST and len(request.POST['search[value]']) >= 3:
    search_value = request.POST['search[value]'].strip()

    q.extend([
        Q(id__icontains=request.POST['search[value]']) |
        (Q(created_by__first_name=request.POST['search[value]']) & Q(created_for=None)) |
        Q(created_for__first_name=request.POST['search[value]']) |
        (Q(created_by__last_name=request.POST['search[value]']) & Q(created_for=None)) |
        Q(created_for__last_name=request.POST['search[value]']) |
        (Q(created_by__email__icontains=search_value) & Q(created_for=None)) |
        Q(created_for__email__icontains=search_value) |
        Q(ticket_category=request.POST['search[value]']) |
        Q(status__icontains=request.POST['search[value]']) |
        Q(issue_type__icontains=request.POST['search[value]']) |
        Q(title__icontains=request.POST['search[value]']) |
        Q(assigned_to__first_name__icontains=request.POST['search[value]']) |

    ])

Now I want to add another OR condition like:

CONCAT(' ', created_by__first_name, created_by__last_name) like '%'search_value'%

But when I add this condition to the queryset it becomes AND

where = ["CONCAT_WS(' ', profiles_userprofile.first_name, profiles_userprofile.last_name) like '"+request.POST['search[value]']+"' "]
            tickets = Ticket.objects.get_active(u, page_type).filter(*q).extra(where=where).exclude(*exq).order_by(*order_dash)[cur:cur_length]

How do I convert this into an OR condition?


Get this bounty!!!

#StackBounty: #python #tensorflow How to get bias and neuron weights in optimizer?

Bounty: 50

In a TensorFlow optimizer (python) the method apply_dense does get called for the neuron weights (layer connections) and the bias weights but I would like to use both in this method.

def _apply_dense(self, grad, weight):
    ...

For example: A fully connected neural network with two hidden layer with two neurons and a bias for each.

Neural network example

If we take a look at layer 2 we get in apply_dense a call for the neuron weights:

neuron weights

and a call for the bias weights:

bias weights

But I would either need both matrix in one call of apply_dense or a weight matrix like this:

all weights from one layer

How to do this?

MWE

For an minimal working example here a stochastic gradient descent optimizer implementation with a momentum. For every layer the momentum of all incoming connections from other neurons is reduced to the mean (see ndims == 2). What i need instead is the mean of not only the momentum values from the incoming neuron connections but also from the incoming bias connections (as described above).

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.python.training import optimizer


class SGDmomentum(optimizer.Optimizer):
    def __init__(self, learning_rate=0.001, mu=0.9, use_locking=False, name="SGDmomentum"):
        super(SGDmomentum, self).__init__(use_locking, name)
        self._lr = learning_rate
        self._mu = mu

        self._lr_t = None
        self._mu_t = None

    def _create_slots(self, var_list):
        for v in var_list:
            self._zeros_slot(v, "a", self._name)

    def _apply_dense(self, grad, weight):
        learning_rate_t = tf.cast(self._lr_t, weight.dtype.base_dtype)
        mu_t = tf.cast(self._mu_t, weight.dtype.base_dtype)
        momentum = self.get_slot(weight, "a")

        if momentum.get_shape().ndims == 2:  # neuron weights
            momentum_mean = tf.reduce_mean(momentum, axis=1, keep_dims=True)
        elif momentum.get_shape().ndims == 1:  # bias weights
            momentum_mean = momentum
        else:
            momentum_mean = momentum

        momentum_update = grad + (mu_t * momentum_mean)
        momentum_t = tf.assign(momentum, momentum_update, use_locking=self._use_locking)

        weight_update = learning_rate_t * momentum_t
        weight_t = tf.assign_sub(weight, weight_update, use_locking=self._use_locking)

        return tf.group(*[weight_t, momentum_t])

    def _prepare(self):
        self._lr_t = tf.convert_to_tensor(self._lr, name="learning_rate")
        self._mu_t = tf.convert_to_tensor(self._mu, name="momentum_term")

For a simple neural network: https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/multilayer_perceptron.py (only change the optimizer to the custom SGDmomentum optimizer)


Get this bounty!!!

#StackBounty: #python #python-2.7 #unc #network-share Python 2: Get network share path from drive letter

Bounty: 50

If I use the following to get the list of all connected drives:

available_drives = ['%s:' % d for d in string.ascii_uppercase if os.path.exists('%s:' % d)]

How do I get the UNC path of the connected drives?

os.path just returns z: instead of sharethatwasmappedtoz


Get this bounty!!!

#StackBounty: #python #amazon-web-services #aws-lambda #aws-api-gateway 500 error while trying to enable CORS on POST with AWS API Gate…

Bounty: 50

I have a response method that looks like this for my Lambda functions:

def respond(err, res=None):
    return {
        'statusCode': 400 if err else 200,
        'body': json.dumps(err) if err else json.dumps(res),
        'headers': {
            'Access-Control-Allow-Headers': 'content-type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token',
            'Access-Control-Allow-Methods': 'POST, GET, DELETE',
            'Access-Control-Allow-Origin': '*',
            'Access-Control-Allow-Credentials': True,
            'Content-Type': 'application/json',
        },
    }

When I test my endpoint with an OPTIONS request from Postman, I get a 500 internal server error. If I test it from the the API Gateway console, I get this additionally:

Execution log for request test-request
Wed Jul 05 14:25:26 UTC 2017 : Starting execution for request: test-invoke-request
Wed Jul 05 14:25:26 UTC 2017 : HTTP Method: OPTIONS, Resource Path: /login
Wed Jul 05 14:25:26 UTC 2017 : Method request path: {}
Wed Jul 05 14:25:26 UTC 2017 : Method request query string: {}
Wed Jul 05 14:25:26 UTC 2017 : Method request headers: {}
Wed Jul 05 14:25:26 UTC 2017 : Method request body before transformations: 
Wed Jul 05 14:25:26 UTC 2017 : Received response. Integration latency: 0 ms
Wed Jul 05 14:25:26 UTC 2017 : Endpoint response body before transformations: 
Wed Jul 05 14:25:26 UTC 2017 : Endpoint response headers: {}
Wed Jul 05 14:25:26 UTC 2017 : Execution failed due to configuration error: Output mapping refers to an invalid method response: 200
Wed Jul 05 14:25:26 UTC 2017 : Method completed with status: 500

I’m not really sure what I’m doing wrong. I think I am returning all the right headers. Any help is appreciated.


Get this bounty!!!

#StackBounty: #python #c++ #c++11 #memory-leaks #swig C++ Memory Leak in Swig Python Module

Bounty: 50

Background

I have created a python module that wraps a c++ program using SWIG. It works just fine, but it has a pretty serious memory leak issue that I think is a result of poorly handled pointers to large map objects. I have very little experience with c++, and I have questions as to whether delete[] can be used on an object created with new in a different function or method.

The program was written in 2007, so excuse the lack of useful c++11 tricks.


The swig extension basically just wraps a single c++ class (Matrix) and a few functions.

Matrix.h

#ifndef __MATRIX__
#define __MATRIX__

#include <string>
#include <vector>
#include <map>
#include <cmath>
#include <fstream>
#include <cstdlib>
#include <stdio.h>
#include <unistd.h>

#include "FileException.h"
#include "ParseException.h"

#define ROUND_TO_INT(n) ((long long)floor(n))
#define MIN(a,b) ((a)<(b)?(a):(b))
#define MAX(a,b) ((a)>(b)?(a):(b))

using namespace std;

class Matrix {

private:


  /**
  * Split a string following delimiters
   */
  void tokenize(const string& str, vector<string>& tokens, const string& delimiters) {

    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
      // Found a token, add it to the vector.
      tokens.push_back(str.substr(lastPos, pos - lastPos));
      // Skip delimiters.  Note the "not_of"
      lastPos = str.find_first_not_of(delimiters, pos);
      // Find next "non-delimiter"
      pos = str.find_first_of(delimiters, lastPos);
    }
  }


public:


  // used for efficiency tests
  long long totalMapSize;
  long long totalOp;

  double ** mat; // the matrix as it is stored in the matrix file
  int length;
  double granularity; // the real granularity used, greater than 1
  long long ** matInt; // the discrete matrix with offset
  double errorMax;
  long long *offsets; // offset of each column
  long long offset; // sum of offsets
  long long *minScoreColumn; // min discrete score at each column
  long long *maxScoreColumn; // max discrete score at each column
  long long *sum;
  long long minScore;  // min total discrete score (normally 0)
  long long maxScore;  // max total discrete score
  long long scoreRange;  // score range = max - min + 1
  long long *bestScore;
  long long *worstScore;
  double background[4];

  Matrix() {
    granularity = 1.0;
    offset = 0;
    background[0] = background[1] = background[2] = background[3] = 0.25;
  }

  Matrix(double pA, double pC, double pG, double pT) {
    granularity = 1.0;
    offset = 0;
    background[0] = pA;
    background[1] = pC;
    background[2] = pG;
    background[3] = pT;  
  }

  ~Matrix() {
      for (int k = 0; k < 4; k++ ) {
        delete[] matInt[k];
      }
      delete[] matInt;
      delete[] mat;
      delete[] offsets;
      delete[] minScoreColumn;
      delete[] maxScoreColumn;
      delete[] sum;
      delete[] bestScore;
      delete[] worstScore;
  }


  void toLogOddRatio () {
    for (int p = 0; p < length; p++) {
      double sum = mat[0][p] + mat[1][p] + mat[2][p] + mat[3][p];
      for (int k = 0; k < 4; k++) {
        mat[k][p] = log((mat[k][p] + 0.25) /(sum + 1)) - log (background[k]); 
      }
    }
  }

  void toLog2OddRatio () {
    for (int p = 0; p < length; p++) {
      double sum = mat[0][p] + mat[1][p] + mat[2][p] + mat[3][p];
      for (int k = 0; k < 4; k++) {
        mat[k][p] = log2((mat[k][p] + 0.25) /(sum + 1)) - log2 (background[k]); 
      }
    }
  }

  /**
    * Transforms the initial matrix into an integer and offseted matrix.
   */
  void computesIntegerMatrix (double granularity, bool sortColumns = true);

  // computes the complete score distribution between score min and max
  void showDistrib (long long min, long long max) {
    map<long long, double> *nbocc = calcDistribWithMapMinMax(min,max); 
    map<long long, double>::iterator iter;

    // computes p values and stores them in nbocc[length] 
    double sum = 0;
    map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
    while (riter != nbocc[length-1].rend()) {
      sum += riter->second;
      nbocc[length][riter->first] = sum;
      riter++;      
    }

    iter = nbocc[length].begin();
    while (iter != nbocc[length].end() && iter->first <= max) {
      //cout << (((iter->first)-offset)/granularity) << " " << (iter->second) << " " << nbocc[length-1][iter->first] << endl;
      iter ++;
    }
  }

  /**
    * Computes the pvalue associated with the threshold score requestedScore.
    */
  void lookForPvalue (long long requestedScore, long long min, long long max, double *pmin, double *pmax);

  /**
    * Computes the score associated with the pvalue requestedPvalue.
    */
  long long lookForScore (long long min, long long max, double requestedPvalue, double *rpv, double *rppv);

  /** 
    * Computes the distribution of scores between score min and max as the DP algrithm proceeds 
    * but instead of using a table we use a map to avoid computations for scores that cannot be reached
    */
  map<long long, double> *calcDistribWithMapMinMax (long long min, long long max); 

  void readMatrix (string matrix) {

    vector<string> str;
    tokenize(matrix, str, " t|");
    this->length = 0;
    this->length = str.size() / 4;
    mat = new double*[4];
    int idx = 0;

    for (int j = 0; j < 4; j++) {
      this->mat[j] = new double[this->length];
      for (int i = 0; i < this->length; i++) {
        mat[j][i] = atof(str.at(idx).data());
        idx++;
      }
    }

    str.clear();

  }

}; /* Matrix */

#endif

Matrix.cpp

#include "Matrix.h"

#define MEMORYCOUNT

void Matrix::computesIntegerMatrix (double granularity, bool sortColumns) {
  double minS = 0, maxS = 0;
  double scoreRange;

  // computes precision
  for (int i = 0; i < length; i++) {
    double min = mat[0][i];
    double max = min;
    for (int k = 1; k < 4; k++ )  {
      min = ((min < mat[k][i])?min:(mat[k][i]));
      max = ((max > mat[k][i])?max:(mat[k][i]));
    }
    minS += min;
    maxS += max;
  } 

  // score range
  scoreRange = maxS - minS + 1;

  if (granularity > 1.0) {
    this->granularity = granularity / scoreRange;
  } else if (granularity < 1.0) {
    this->granularity = 1.0 / granularity;
  } else {
    this->granularity = 1.0;
  }

  matInt = new long long *[length];
  for (int k = 0; k < 4; k++ ) {
    matInt[k] = new long long[length];
    for (int p = 0 ; p < length; p++) {
      matInt[k][p] = ROUND_TO_INT((double)(mat[k][p]*this->granularity)); 
    }
  }

  this->errorMax = 0.0;
  for (int i = 1; i < length; i++) {
    double maxE = mat[0][i] * this->granularity - (matInt[0][i]);
    for (int k = 1; k < 4; k++) {
      maxE = ((maxE < mat[k][i] * this->granularity - matInt[k][i])?(mat[k][i] * this->granularity - (matInt[k][i])):(maxE));
    }
    this->errorMax += maxE;
  }

  if (sortColumns) {
    // sort the columns : the first column is the one with the greatest value
    long long min = 0;
    for (int i = 0; i < length; i++) {
      for (int k = 0; k < 4; k++) {
        min = MIN(min,matInt[k][i]);
      }
    }
    min --;
    long long *maxs = new long long [length];
    for (int i = 0; i < length; i++) {
      maxs[i] = matInt[0][i];
      for (int k = 1; k < 4; k++) {
        if (maxs[i] < matInt[k][i]) {
          maxs[i] = matInt[k][i];
        }
      }
    }
    long long **mattemp = new long long *[4];
    for (int k = 0; k < 4; k++) {        
      mattemp[k] = new long long [length];
    }
    for (int i = 0; i < length; i++) {
      long long max = maxs[0];
      int p = 0;
      for (int j = 1; j < length; j++) {
        if (max < maxs[j]) {
          max = maxs[j];
          p = j;
        }
      }
      maxs[p] = min;
      for (int k = 0; k < 4; k++) {        
        mattemp[k][i] = matInt[k][p];
      }
    }

    for (int k = 0; k < 4; k++)  {
      for (int i = 0; i < length; i++) {
        matInt[k][i] = mattemp[k][i];
      }
    }

    for (int k = 0; k < 4; k++) {        
      delete[] mattemp[k];
    }
    delete[] mattemp;
    delete[] maxs;
  }

  // computes offsets
  this->offset = 0;
  offsets = new long long [length];
  for (int i = 0; i < length; i++) {
    long long min = matInt[0][i];
    for (int k = 1; k < 4; k++ )  {
      min = ((min < matInt[k][i])?min:(matInt[k][i]));
    }
    offsets[i] = -min;
    for (int k = 0; k < 4; k++ )  {
      matInt[k][i] += offsets[i];  
    }
    this->offset += offsets[i];
  }

  // look for the minimum score of the matrix for each column
  minScoreColumn = new long long [length];
  maxScoreColumn = new long long [length];
  sum            = new long long [length];
  minScore = 0;
  maxScore = 0;
  for (int i = 0; i < length; i++) {
    minScoreColumn[i] = matInt[0][i];
    maxScoreColumn[i] = matInt[0][i];
    sum[i] = 0;
    for (int k = 1; k < 4; k++ )  {
      sum[i] = sum[i] + matInt[k][i];
      if (minScoreColumn[i] > matInt[k][i]) {
        minScoreColumn[i] = matInt[k][i];
      }
      if (maxScoreColumn[i] < matInt[k][i]) {
        maxScoreColumn[i] = matInt[k][i];
      }
    }
    minScore = minScore + minScoreColumn[i];
    maxScore = maxScore + maxScoreColumn[i];
    //cout << "minScoreColumn[" << i << "] = " << minScoreColumn[i] << endl;
    //cout << "maxScoreColumn[" << i << "] = " << maxScoreColumn[i] << endl;
  }
  this->scoreRange = maxScore - minScore + 1;

  bestScore = new long long[length];
  worstScore = new long long[length];
  bestScore[length-1] = maxScore;
  worstScore[length-1] = minScore;
  for (int i = length - 2; i >= 0; i--) {
    bestScore[i]  = bestScore[i+1]  - maxScoreColumn[i+1];
    worstScore[i] = worstScore[i+1] - minScoreColumn[i+1];
  }


}




/**
* Computes the pvalue associated with the threshold score requestedScore.
 */
void Matrix::lookForPvalue (long long requestedScore, long long min, long long max, double *pmin, double *pmax) {

  map<long long, double> nbocc = calcDistribWithMapMinMax(min,max); 
  map<long long, double>::iterator iter;


  // computes p values and stores them in nbocc[length] 
  double sum = nbocc[length][max+1];
  long long s = max + 1;
  map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
  while (riter != nbocc[length-1].rend()) {
    sum += riter->second;
    if (riter->first >= requestedScore) s = riter->first;
    nbocc[length][riter->first] = sum;
    riter++;      
  }
  //cout << "   s found : " << s << endl;

  iter = nbocc[length].find(s);
  while (iter != nbocc[length].begin() && iter->first >= s - errorMax) {
    iter--;      
  }
  //cout << "   s - E found : " << iter->first << endl;

#ifdef MEMORYCOUNT
  // for tests, store the number of memory bloc necessary
  for (int pos = 0; pos <= length; pos++) {
    totalMapSize += nbocc[pos].size();
  }
#endif

  *pmax = nbocc[length][s];
  *pmin = iter->second;

}



/**
* Computes the score associated with the pvalue requestedPvalue.
 */
long long Matrix::lookForScore (long long min, long long max, double requestedPvalue, double *rpv, double *rppv) {

  map<long long, double> *nbocc = calcDistribWithMapMinMax(min,max); 
  map<long long, double>::iterator iter;

  // computes p values and stores them in nbocc[length] 
  double sum = 0.0;
  map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
  long long alpha = riter->first+1;
  long long alpha_E = alpha;
  nbocc[length][alpha] = 0.0;
  while (riter != nbocc[length-1].rend()) {
    sum += riter->second;
    nbocc[length][riter->first] = sum;
    if (sum >= requestedPvalue) { 
      break;
    }
    riter++;      
  }
  if (sum > requestedPvalue) {
    alpha_E = riter->first;
    riter--;
    alpha = riter->first; 
  } else {
    if (riter == nbocc[length-1].rend()) { // path following the remark of the mail
      riter--;
      alpha = alpha_E = riter->first;
    } else {
      alpha = riter->first;
      riter++;
      sum += riter->second;
      alpha_E = riter->first;
    }
    nbocc[length][alpha_E] = sum;  
    //cout << "Pv(S) " << riter->first << " " << sum << endl;   
  } 

#ifdef MEMORYCOUNT
  // for tests, store the number of memory bloc necessary
  for (int pos = 0; pos <= length; pos++) {
    totalMapSize += nbocc[pos].size();
  }
#endif

  if (alpha - alpha_E > errorMax) alpha_E = alpha;

  *rpv = nbocc[length][alpha];
  *rppv = nbocc[length][alpha_E];   

  delete[] nbocc;
  return alpha;

}


// computes the distribution of scores between score min and max as the DP algrithm proceeds 
// but instead of using a table we use a map to avoid computations for scores that cannot be reached
map<long long, double> *Matrix::calcDistribWithMapMinMax (long long min, long long max) { 

  // maps for each step of the computation
  // nbocc[length] stores the pvalue
  // nbocc[pos] for pos < length stores the qvalue
  map<long long, double> *nbocc = new map<long long, double> [length+1];
  map<long long, double>::iterator iter;

  long long *maxs = new long long[length+1]; // @ pos i maximum score reachable with the suffix matrix from i to length-1

  maxs[length] = 0;
  for (int i = length-1; i >= 0; i--) {
    maxs[i] = maxs[i+1] + maxScoreColumn[i];
  }

  // initializes the map at position 0
  for (int k = 0; k < 4; k++) {
    if (matInt[k][0]+maxs[1] >= min) {
      nbocc[0][matInt[k][0]] += background[k];
    }
  }

  // computes q values for scores greater or equal than min
  nbocc[length-1][max+1] = 0.0;
  for (int pos = 1; pos < length; pos++) {
    iter = nbocc[pos-1].begin();
    while (iter != nbocc[pos-1].end()) {
      for (int k = 0; k < 4; k++) {
        long long sc = iter->first + matInt[k][pos];
        if (sc+maxs[pos+1] >= min) {
          // the score min can be reached
          if (sc > max) {
            // the score will be greater than max for all suffixes
            nbocc[length-1][max+1] += nbocc[pos-1][iter->first] * background[k]; //pow(4,length-pos-1) ;
            totalOp++;
          } else {              
            nbocc[pos][sc] += nbocc[pos-1][iter->first] * background[k];
            totalOp++;
          }
        } 
      }
      iter++;      
    }      
    //cerr << "        map size for " << pos << " " << nbocc[pos].size() << endl;
  }

  delete[] maxs;

  return nbocc;


}

pytfmpval.i

%module pytfmpval
%{
#include "../src/Matrix.h"
#define SWIG_FILE_WITH_INIT
%}

%include "cpointer.i"
%include "std_string.i"
%include "std_vector.i"
%include "typemaps.i"
%include "../src/Matrix.h"

%pointer_class(double, doublep)
%pointer_class(int, intp)

%nodefaultdtor Matrix;

The c++ functions are called in this python module.

tfmp.py

from __future__ import absolute_import, division, print_function, unicode_literals
import pytfmpval.pytfmpval as tfm
from math import ceil

def read_matrix(matrix, bg=[0.25, 0.25, 0.25, 0.25], mat_type="counts", log_type="nat"):
    """
    From a string of space-delimited counts create a Matrix object.

    Break the string into 4 rows corresponding to A, C, G, and T.
    This function also converts it to a log-odds (position weight) matrix if necessary.

    Args:
        matrix_file (str): White-space delimited string of row-concatenated motif matrix.
        bg (list of floats): Background nucleotide frequencies for [A, C, G, T].
        mat_type (str): Type of motif matrix provided. Options are: "counts", "pfm", "pwm".
            "counts" is for raw count matrices for each base at each position.
            "pfm" is for position frequency matrices (frequencies already calculated).
            "pwm" is for position weight matrices (also referred to as position-specific scoring matrices.)
        log_type (str): Base to use for log. Default is to use the natural log. "log2" is the other option.
            This will affect the scores and p-values.

    Returns:
        m (pytfmpval Matrix): Matrix in pwm format.
    """

    try:
        a, c, g, t = bg[0], bg[1], bg[2], bg[3]
        if (len(matrix.split()) % 4) != 0:
            raise ValueError("Uneven rows in motif matrix. Ensure rows of equal length in input.")

        m = tfm.Matrix(a, c, g, t)
        m.readMatrix(matrix)

        if mat_type.upper() == "COUNTS":
            if log_type.upper() == "NAT":
                m.toLogOddRatio()
            elif log_type.upper() == "LOG2":
                m.toLog2OddRatio()
            else:
                print("Improper log type argument, using natural log.")
                m.toLogOddRatio()

        return m

    except ValueError as error:
        print(repr(error))


def score2pval(matrix, req_score):
    """
    Determine the p-value for a given score for a specific motif PWM.

    Args:
        matrix (pytfmpval Matrix): Matrix in pwm format.
        req_score (float): Requested score for which to determine the p-value.

    Returns:
        ppv (float): The calculated p-value corresponding to the score.
    """

    granularity = 0.1
    max_granularity = 1e-10
    decrgr = 10  # Factor to increase granularity by after each iteration.

    pv = tfm.doublep()
    ppv = tfm.doublep()

    while granularity > max_granularity:
        matrix.computesIntegerMatrix(granularity)
        max_s = int(req_score * matrix.granularity + matrix.offset + matrix.errorMax + 1)
        min_s = int(req_score * matrix.granularity + matrix.offset - matrix.errorMax - 1)
        score = int(req_score * matrix.granularity + matrix.offset)

        matrix.lookForPvalue(score, min_s, max_s, ppv, pv)

        if ppv.value() == pv.value():
            return ppv.value()

        granularity = granularity / decrgr

    print("Max granularity exceeded. Returning closest approximation.")
    return ppv.value()


def pval2score(matrix, pval):
    """
    Determine the score for a given p-value for a specific motif PWM.

    Args:
        matrix (pytfmpval Matrix): Matrix in pwm format.
        pval (float): p-value for which to determine the score.

    Returns:
        score (float): The calculated score corresponding to the p-value.
    """

    init_granularity = 0.1
    max_granularity = 1e-10
    decrgr = 10  # Factor to increase granularity by after each iteration.

    pv = tfm.doublep()  # Initialize as a c++ double.
    ppv = tfm.doublep()
    matrix.computesIntegerMatrix(init_granularity)
    max_s = int(matrix.maxScore + ceil(matrix.errorMax + 0.5))
    min_s = int(matrix.minScore)
    granularity = init_granularity

    while granularity > max_granularity:
        matrix.computesIntegerMatrix(granularity)

        score = matrix.lookForScore(min_s, max_s, pval, pv, ppv)

        min_s = int((score - ceil(matrix.errorMax + 0.5)) * decrgr)
        max_s = int((score + ceil(matrix.errorMax + 0.5)) * decrgr)

        if ppv.value() == pv.value():
            break

        granularity = granularity / decrgr

    if granularity <= max_granularity:
        print("Max granularity exceeded. Returning closest score approximation.")

    final_score = (score - matrix.offset) / matrix.granularity
    return final_score

The script I’ve been using for testing.

thresholds.py

#!/usr/bin/env python
"""
Calculate motif thresholds for a given motif file and print to a new file.

Usage: thresholds.py -m <motifs.txt> -o <output.txt> [OPTIONS]

Args:
    -m (str): Input filename of a file containing PWMs.
    -o (str): Output filename.
    -a (float, optional) <0.25>: Background probability for A nucleotides. If not
        provided then all are assumed to be equally likely (all are 0.25).
    -t (float, optional) <0.25>: Background probability for T nucleotides.
    -c (float, optional) <0.25>: Background probability for C nucleotides.
    -g (float, optional) <0.25>: Background probability for G nucleotides.
    -p (int, optinal) <1>: Processor cores to utilize. Will decrease computation time linearly.
    -pc (float, optional) <0.1>: Pseudocounts value to be added to all positions of the motif frequency matrix
        before calculating the probability matrix.
    -pv (float, optional) <0.00001>: P-value to be used for defining thresholds for each motifs.
        I don't recommend changing this.
    -ow (flag, optional): OverWrite: If present, thresholds already present in the input file will be
        replaced in the output file.
"""

import sys
import argparse
from multiprocessing.dummy import Pool as ThreadPool
import itertools
from pytfmpval import tfmp
from timeit import default_timer as timer

from Bio import motifs

# from memory_profiler import profile
import pdb


# @profile
def find_thresh(combo):
    """
    Calculate the detection thresholds for each pwm matrix given the nucleotide
    background frequencies and desired p-value.

    Args:
        combo (tuple): Three element tuple as defined below.
            matrix (tuple): The PWM for which a detection threshold will be found.
                (motif name, rows of matrix concatenated to single string with spaces between positions).
            pval (float): P-value to which threshold will be calculated.
            bg (list of floats): List containing background nucleotide frequencies [A, C, G, T]
    """

    matrix, pval, bg = (combo[0], combo[1], combo[2])

    start = timer()
    mat = tfmp.read_matrix(matrix[1], bg=bg, mat_type="pwm")
    thresh = tfmp.pval2score(mat, pval)
    pdb.set_trace()
    del mat
    end = timer()

    print(matrix[0] + ": " + str(end - start))

    return (matrix[0], thresh)


def main(motif_file, motif_outfile, pc, bp, ow, pv, p):
    matrices = []
    background = {'A': bp[0], 'C': bp[1], 'G': bp[2], 'T': bp[3]}
    print(("Baseline nucleotide frequencies:nt" + str(background)))

    # Calculate thresholds using pytfmpval.
    print(("Reading in motifs."))
    fh = open(motif_file)
    for m in motifs.parse(fh, "jaspar"):
        pfm = m.counts.normalize(pseudocounts=pc)    # Create frequency matrix.
        pwm = pfm.log_odds(background)              # Calculate to log likelihoods vs background.

        # Create matrix string from motif pwm.
        mat = pwm[0] + pwm[1] + pwm[2] + pwm[3]
        mat = [str(x) for x in mat]
        mat = " ".join(mat)
        matrices.append((m.name, mat))

    fh.close()

    # Multiprocessing to use multiple processing cores.
    thresholds = []
    with ThreadPool(p) as pool:
        for x in pool.imap_unordered(find_thresh, zip(matrices, itertools.repeat(pv),
                                                      itertools.repeat(bp)), chunksize=8):
            thresholds.append(x)

    print(("Total motifs read: " + str(len(thresholds))))
    print("Writing output file.")

    return


if __name__ == '__main__':
    parser = argparse.ArgumentParser(usage=__doc__)

    parser.add_argument("-m", "--motif", dest="motif_file", required=True)
    parser.add_argument("-o", "--outfile", dest="motif_outfile", required=True)
    parser.add_argument("-a", "--a_freq", dest="a_freq", required=False, default=0.25, type=float)
    parser.add_argument("-c", "--c_freq", dest="c_freq", required=False, default=0.25, type=float)
    parser.add_argument("-g", "--g_freq", dest="g_freq", required=False, default=0.25, type=float)
    parser.add_argument("-t", "--t_freq", dest="t_freq", required=False, default=0.25, type=float)
    parser.add_argument("-p", "--processors", dest="processors", required=False, default=1, type=int)
    parser.add_argument("-pc", "--pseudocounts", dest="pseudocounts",
                        required=False, default=0.1, type=float)
    parser.add_argument("-pv", "--pval", dest="p_val",
                        required=False, default=0.00001, type=float)
    parser.add_argument("-ow", "--overwrite", action="store_true", required=False)
    args = parser.parse_args()

    if sum([args.a_freq, args.c_freq, args.g_freq, args.t_freq]) == 1:
        bp = [args.a_freq, args.c_freq, args.g_freq, args.t_freq]
    else:
        print("Background frequencies must equal 1. Check input parameters, exiting.")
        sys.exit()

    main(args.motif_file, args.motif_outfile, args.pseudocounts, bp, args.overwrite, args.p_val, args.processors)

Sample input file can be accessed here.

setup.py can be accessed here.


Setup

To create the swig python module for the Matrix class:
swig -python -c++ pytfmpval.i

To build and install the pytfmpval python module:
pip install .

Python module file structure can be seen here, as can other source code I’ve removed. You’ll likely need the MANIFEST.in for the python module to build properly.

Command to test:
python thresholds.py -m sample_motifs.txt -o sample_motifs.results.txt -ow

That will pop you into the python debugger, and can continue using c. You’ll see that most of the Matrix objects created are processed quickly and use little memory. I’ve included one of the more troublesome ones (ARID3A) in the sample input. It takes much longer (~90s) to run and uses much more memory than the others (~2.7 gb), which is fine and expected. But the memory isn’t ever released, so processing a large file of these quickly becomes an issue.


I worry that nbocc in Matrix.cpp is not being properly dereferenced or deleted. Is this use valid?

I have tried using gc.collect() and I am using the multiprocessing module as recommended in this question to call these functions from my python program. I’ve also tried deleting the Matrix object from within python to no avail.

I’m out of characters, but will provide any additional needed info in the comments as well as I can.


Get this bounty!!!

#StackBounty: #python #google-chrome #anaconda How to make chrome compatible with Jupyter for Anaconda?

Bounty: 50

I am an anaconda user and Jupyter is a neat tool to run python code. However, for my macbook, I can’t open it in Chrome (This page isn’t working
localhost didn’t send any data.),but it works in Safari, I have tried to reinstall chrome, but I still can’t fix it. My system is Mac OS 10.11.5.
Who knows how I can fix it?
I can understand that the problem might be not specific enough, but I have been puzzled by this problem for quite a period of time.


Get this bounty!!!