#StackBounty: #windows #mysql #python #openssl Python/MySQL on Windows. Problems with openssl

Bounty: 100

I have a MySQL Server set up to use SSL and I also have the CA Certificate.

When I connect to the server using MySQL Workbench, I do not need the certificate. I can also connect to the server using Python and MySQLdb on a Mac without the CA-certificate.

But when I try to connect using the exact same setup of Python and MySQLdb on a windows machine, I get access denied. It appears that I need the CA. And when I enter the CA, I get the following error

_mysql_exceptions.OperationalError: (2026, 'SSL connection error')

My code to open the connection is below:

db = MySQLdb.connect(host="host.name",    
                 port=3306,
                 user="user",         
                 passwd="secret_password",  
                 db="database", 
                 ssl={'ca': '/path/to/ca/cert'})  

Could anyone point out what the problem is on a windows? I believe it has to do with OpenSSL and Python. I installed OpenSSL from here. But I don’t think Python is using this version that I installed, since when I print the version using Python, it’s not the same.

This is what Python prints. It is still not a very old version and should have worked when connecting to MySQL

OpenSSL 1.0.2j 26 Sep 2016

I am really not used to having to work with OpenSSL and its issues. I’ve literally tried all the solutions found on google by searching the error I get, and you would think one of them should work. But none did and hence I’m guessing the problem is with the OpenSSL and Python on my system. Anyone know how I should try to at least identify the exact problem?

I also do not understand how I can connect to the MySQL Server without the CA Certificate using a Mac/Python or MySQL Workbench, but I get an access denied error in Windows using Python :/

UPDATE:

Python version 2.7.13

MySQL Server Enterprise version 5.7.18


Get this bounty!!!

#StackBounty: #python #mysql #django How to concat two columns of table django model

Bounty: 50

I am implementing search in my project what I want is to concat to column in where clause to get results from table.

Here is what I am doing:

from django.db.models import Q

if 'search[value]' in request.POST and len(request.POST['search[value]']) >= 3:
    search_value = request.POST['search[value]'].strip()

    q.extend([
        Q(id__icontains=request.POST['search[value]']) |
        (Q(created_by__first_name=request.POST['search[value]']) & Q(created_for=None)) |
        Q(created_for__first_name=request.POST['search[value]']) |
        (Q(created_by__last_name=request.POST['search[value]']) & Q(created_for=None)) |
        Q(created_for__last_name=request.POST['search[value]']) |
        (Q(created_by__email__icontains=search_value) & Q(created_for=None)) |
        Q(created_for__email__icontains=search_value) |
        Q(ticket_category=request.POST['search[value]']) |
        Q(status__icontains=request.POST['search[value]']) |
        Q(issue_type__icontains=request.POST['search[value]']) |
        Q(title__icontains=request.POST['search[value]']) |
        Q(assigned_to__first_name__icontains=request.POST['search[value]']) |

    ])

Now I want to add another OR condition like:

CONCAT(' ', created_by__first_name, created_by__last_name) like '%'search_value'%

But when I add this condition to the queryset it becomes AND

where = ["CONCAT_WS(' ', profiles_userprofile.first_name, profiles_userprofile.last_name) like '"+request.POST['search[value]']+"' "]
            tickets = Ticket.objects.get_active(u, page_type).filter(*q).extra(where=where).exclude(*exq).order_by(*order_dash)[cur:cur_length]

How do I convert this into an OR condition?


Get this bounty!!!

#StackBounty: #python #tensorflow How to get bias and neuron weights in optimizer?

Bounty: 50

In a TensorFlow optimizer (python) the method apply_dense does get called for the neuron weights (layer connections) and the bias weights but I would like to use both in this method.

def _apply_dense(self, grad, weight):
    ...

For example: A fully connected neural network with two hidden layer with two neurons and a bias for each.

Neural network example

If we take a look at layer 2 we get in apply_dense a call for the neuron weights:

neuron weights

and a call for the bias weights:

bias weights

But I would either need both matrix in one call of apply_dense or a weight matrix like this:

all weights from one layer

How to do this?

MWE

For an minimal working example here a stochastic gradient descent optimizer implementation with a momentum. For every layer the momentum of all incoming connections from other neurons is reduced to the mean (see ndims == 2). What i need instead is the mean of not only the momentum values from the incoming neuron connections but also from the incoming bias connections (as described above).

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
from tensorflow.python.training import optimizer


class SGDmomentum(optimizer.Optimizer):
    def __init__(self, learning_rate=0.001, mu=0.9, use_locking=False, name="SGDmomentum"):
        super(SGDmomentum, self).__init__(use_locking, name)
        self._lr = learning_rate
        self._mu = mu

        self._lr_t = None
        self._mu_t = None

    def _create_slots(self, var_list):
        for v in var_list:
            self._zeros_slot(v, "a", self._name)

    def _apply_dense(self, grad, weight):
        learning_rate_t = tf.cast(self._lr_t, weight.dtype.base_dtype)
        mu_t = tf.cast(self._mu_t, weight.dtype.base_dtype)
        momentum = self.get_slot(weight, "a")

        if momentum.get_shape().ndims == 2:  # neuron weights
            momentum_mean = tf.reduce_mean(momentum, axis=1, keep_dims=True)
        elif momentum.get_shape().ndims == 1:  # bias weights
            momentum_mean = momentum
        else:
            momentum_mean = momentum

        momentum_update = grad + (mu_t * momentum_mean)
        momentum_t = tf.assign(momentum, momentum_update, use_locking=self._use_locking)

        weight_update = learning_rate_t * momentum_t
        weight_t = tf.assign_sub(weight, weight_update, use_locking=self._use_locking)

        return tf.group(*[weight_t, momentum_t])

    def _prepare(self):
        self._lr_t = tf.convert_to_tensor(self._lr, name="learning_rate")
        self._mu_t = tf.convert_to_tensor(self._mu, name="momentum_term")

For a simple neural network: https://raw.githubusercontent.com/aymericdamien/TensorFlow-Examples/master/examples/3_NeuralNetworks/multilayer_perceptron.py (only change the optimizer to the custom SGDmomentum optimizer)


Get this bounty!!!

#StackBounty: #python #python-2.7 #unc #network-share Python 2: Get network share path from drive letter

Bounty: 50

If I use the following to get the list of all connected drives:

available_drives = ['%s:' % d for d in string.ascii_uppercase if os.path.exists('%s:' % d)]

How do I get the UNC path of the connected drives?

os.path just returns z: instead of sharethatwasmappedtoz


Get this bounty!!!

#StackBounty: #python #amazon-web-services #aws-lambda #aws-api-gateway 500 error while trying to enable CORS on POST with AWS API Gate…

Bounty: 50

I have a response method that looks like this for my Lambda functions:

def respond(err, res=None):
    return {
        'statusCode': 400 if err else 200,
        'body': json.dumps(err) if err else json.dumps(res),
        'headers': {
            'Access-Control-Allow-Headers': 'content-type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token',
            'Access-Control-Allow-Methods': 'POST, GET, DELETE',
            'Access-Control-Allow-Origin': '*',
            'Access-Control-Allow-Credentials': True,
            'Content-Type': 'application/json',
        },
    }

When I test my endpoint with an OPTIONS request from Postman, I get a 500 internal server error. If I test it from the the API Gateway console, I get this additionally:

Execution log for request test-request
Wed Jul 05 14:25:26 UTC 2017 : Starting execution for request: test-invoke-request
Wed Jul 05 14:25:26 UTC 2017 : HTTP Method: OPTIONS, Resource Path: /login
Wed Jul 05 14:25:26 UTC 2017 : Method request path: {}
Wed Jul 05 14:25:26 UTC 2017 : Method request query string: {}
Wed Jul 05 14:25:26 UTC 2017 : Method request headers: {}
Wed Jul 05 14:25:26 UTC 2017 : Method request body before transformations: 
Wed Jul 05 14:25:26 UTC 2017 : Received response. Integration latency: 0 ms
Wed Jul 05 14:25:26 UTC 2017 : Endpoint response body before transformations: 
Wed Jul 05 14:25:26 UTC 2017 : Endpoint response headers: {}
Wed Jul 05 14:25:26 UTC 2017 : Execution failed due to configuration error: Output mapping refers to an invalid method response: 200
Wed Jul 05 14:25:26 UTC 2017 : Method completed with status: 500

I’m not really sure what I’m doing wrong. I think I am returning all the right headers. Any help is appreciated.


Get this bounty!!!

#StackBounty: #python #c++ #c++11 #memory-leaks #swig C++ Memory Leak in Swig Python Module

Bounty: 50

Background

I have created a python module that wraps a c++ program using SWIG. It works just fine, but it has a pretty serious memory leak issue that I think is a result of poorly handled pointers to large map objects. I have very little experience with c++, and I have questions as to whether delete[] can be used on an object created with new in a different function or method.

The program was written in 2007, so excuse the lack of useful c++11 tricks.


The swig extension basically just wraps a single c++ class (Matrix) and a few functions.

Matrix.h

#ifndef __MATRIX__
#define __MATRIX__

#include <string>
#include <vector>
#include <map>
#include <cmath>
#include <fstream>
#include <cstdlib>
#include <stdio.h>
#include <unistd.h>

#include "FileException.h"
#include "ParseException.h"

#define ROUND_TO_INT(n) ((long long)floor(n))
#define MIN(a,b) ((a)<(b)?(a):(b))
#define MAX(a,b) ((a)>(b)?(a):(b))

using namespace std;

class Matrix {

private:


  /**
  * Split a string following delimiters
   */
  void tokenize(const string& str, vector<string>& tokens, const string& delimiters) {

    // Skip delimiters at beginning.
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    // Find first "non-delimiter".
    string::size_type pos     = str.find_first_of(delimiters, lastPos);

    while (string::npos != pos || string::npos != lastPos)
    {
      // Found a token, add it to the vector.
      tokens.push_back(str.substr(lastPos, pos - lastPos));
      // Skip delimiters.  Note the "not_of"
      lastPos = str.find_first_not_of(delimiters, pos);
      // Find next "non-delimiter"
      pos = str.find_first_of(delimiters, lastPos);
    }
  }


public:


  // used for efficiency tests
  long long totalMapSize;
  long long totalOp;

  double ** mat; // the matrix as it is stored in the matrix file
  int length;
  double granularity; // the real granularity used, greater than 1
  long long ** matInt; // the discrete matrix with offset
  double errorMax;
  long long *offsets; // offset of each column
  long long offset; // sum of offsets
  long long *minScoreColumn; // min discrete score at each column
  long long *maxScoreColumn; // max discrete score at each column
  long long *sum;
  long long minScore;  // min total discrete score (normally 0)
  long long maxScore;  // max total discrete score
  long long scoreRange;  // score range = max - min + 1
  long long *bestScore;
  long long *worstScore;
  double background[4];

  Matrix() {
    granularity = 1.0;
    offset = 0;
    background[0] = background[1] = background[2] = background[3] = 0.25;
  }

  Matrix(double pA, double pC, double pG, double pT) {
    granularity = 1.0;
    offset = 0;
    background[0] = pA;
    background[1] = pC;
    background[2] = pG;
    background[3] = pT;  
  }

  ~Matrix() {
      for (int k = 0; k < 4; k++ ) {
        delete[] matInt[k];
      }
      delete[] matInt;
      delete[] mat;
      delete[] offsets;
      delete[] minScoreColumn;
      delete[] maxScoreColumn;
      delete[] sum;
      delete[] bestScore;
      delete[] worstScore;
  }


  void toLogOddRatio () {
    for (int p = 0; p < length; p++) {
      double sum = mat[0][p] + mat[1][p] + mat[2][p] + mat[3][p];
      for (int k = 0; k < 4; k++) {
        mat[k][p] = log((mat[k][p] + 0.25) /(sum + 1)) - log (background[k]); 
      }
    }
  }

  void toLog2OddRatio () {
    for (int p = 0; p < length; p++) {
      double sum = mat[0][p] + mat[1][p] + mat[2][p] + mat[3][p];
      for (int k = 0; k < 4; k++) {
        mat[k][p] = log2((mat[k][p] + 0.25) /(sum + 1)) - log2 (background[k]); 
      }
    }
  }

  /**
    * Transforms the initial matrix into an integer and offseted matrix.
   */
  void computesIntegerMatrix (double granularity, bool sortColumns = true);

  // computes the complete score distribution between score min and max
  void showDistrib (long long min, long long max) {
    map<long long, double> *nbocc = calcDistribWithMapMinMax(min,max); 
    map<long long, double>::iterator iter;

    // computes p values and stores them in nbocc[length] 
    double sum = 0;
    map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
    while (riter != nbocc[length-1].rend()) {
      sum += riter->second;
      nbocc[length][riter->first] = sum;
      riter++;      
    }

    iter = nbocc[length].begin();
    while (iter != nbocc[length].end() && iter->first <= max) {
      //cout << (((iter->first)-offset)/granularity) << " " << (iter->second) << " " << nbocc[length-1][iter->first] << endl;
      iter ++;
    }
  }

  /**
    * Computes the pvalue associated with the threshold score requestedScore.
    */
  void lookForPvalue (long long requestedScore, long long min, long long max, double *pmin, double *pmax);

  /**
    * Computes the score associated with the pvalue requestedPvalue.
    */
  long long lookForScore (long long min, long long max, double requestedPvalue, double *rpv, double *rppv);

  /** 
    * Computes the distribution of scores between score min and max as the DP algrithm proceeds 
    * but instead of using a table we use a map to avoid computations for scores that cannot be reached
    */
  map<long long, double> *calcDistribWithMapMinMax (long long min, long long max); 

  void readMatrix (string matrix) {

    vector<string> str;
    tokenize(matrix, str, " t|");
    this->length = 0;
    this->length = str.size() / 4;
    mat = new double*[4];
    int idx = 0;

    for (int j = 0; j < 4; j++) {
      this->mat[j] = new double[this->length];
      for (int i = 0; i < this->length; i++) {
        mat[j][i] = atof(str.at(idx).data());
        idx++;
      }
    }

    str.clear();

  }

}; /* Matrix */

#endif

Matrix.cpp

#include "Matrix.h"

#define MEMORYCOUNT

void Matrix::computesIntegerMatrix (double granularity, bool sortColumns) {
  double minS = 0, maxS = 0;
  double scoreRange;

  // computes precision
  for (int i = 0; i < length; i++) {
    double min = mat[0][i];
    double max = min;
    for (int k = 1; k < 4; k++ )  {
      min = ((min < mat[k][i])?min:(mat[k][i]));
      max = ((max > mat[k][i])?max:(mat[k][i]));
    }
    minS += min;
    maxS += max;
  } 

  // score range
  scoreRange = maxS - minS + 1;

  if (granularity > 1.0) {
    this->granularity = granularity / scoreRange;
  } else if (granularity < 1.0) {
    this->granularity = 1.0 / granularity;
  } else {
    this->granularity = 1.0;
  }

  matInt = new long long *[length];
  for (int k = 0; k < 4; k++ ) {
    matInt[k] = new long long[length];
    for (int p = 0 ; p < length; p++) {
      matInt[k][p] = ROUND_TO_INT((double)(mat[k][p]*this->granularity)); 
    }
  }

  this->errorMax = 0.0;
  for (int i = 1; i < length; i++) {
    double maxE = mat[0][i] * this->granularity - (matInt[0][i]);
    for (int k = 1; k < 4; k++) {
      maxE = ((maxE < mat[k][i] * this->granularity - matInt[k][i])?(mat[k][i] * this->granularity - (matInt[k][i])):(maxE));
    }
    this->errorMax += maxE;
  }

  if (sortColumns) {
    // sort the columns : the first column is the one with the greatest value
    long long min = 0;
    for (int i = 0; i < length; i++) {
      for (int k = 0; k < 4; k++) {
        min = MIN(min,matInt[k][i]);
      }
    }
    min --;
    long long *maxs = new long long [length];
    for (int i = 0; i < length; i++) {
      maxs[i] = matInt[0][i];
      for (int k = 1; k < 4; k++) {
        if (maxs[i] < matInt[k][i]) {
          maxs[i] = matInt[k][i];
        }
      }
    }
    long long **mattemp = new long long *[4];
    for (int k = 0; k < 4; k++) {        
      mattemp[k] = new long long [length];
    }
    for (int i = 0; i < length; i++) {
      long long max = maxs[0];
      int p = 0;
      for (int j = 1; j < length; j++) {
        if (max < maxs[j]) {
          max = maxs[j];
          p = j;
        }
      }
      maxs[p] = min;
      for (int k = 0; k < 4; k++) {        
        mattemp[k][i] = matInt[k][p];
      }
    }

    for (int k = 0; k < 4; k++)  {
      for (int i = 0; i < length; i++) {
        matInt[k][i] = mattemp[k][i];
      }
    }

    for (int k = 0; k < 4; k++) {        
      delete[] mattemp[k];
    }
    delete[] mattemp;
    delete[] maxs;
  }

  // computes offsets
  this->offset = 0;
  offsets = new long long [length];
  for (int i = 0; i < length; i++) {
    long long min = matInt[0][i];
    for (int k = 1; k < 4; k++ )  {
      min = ((min < matInt[k][i])?min:(matInt[k][i]));
    }
    offsets[i] = -min;
    for (int k = 0; k < 4; k++ )  {
      matInt[k][i] += offsets[i];  
    }
    this->offset += offsets[i];
  }

  // look for the minimum score of the matrix for each column
  minScoreColumn = new long long [length];
  maxScoreColumn = new long long [length];
  sum            = new long long [length];
  minScore = 0;
  maxScore = 0;
  for (int i = 0; i < length; i++) {
    minScoreColumn[i] = matInt[0][i];
    maxScoreColumn[i] = matInt[0][i];
    sum[i] = 0;
    for (int k = 1; k < 4; k++ )  {
      sum[i] = sum[i] + matInt[k][i];
      if (minScoreColumn[i] > matInt[k][i]) {
        minScoreColumn[i] = matInt[k][i];
      }
      if (maxScoreColumn[i] < matInt[k][i]) {
        maxScoreColumn[i] = matInt[k][i];
      }
    }
    minScore = minScore + minScoreColumn[i];
    maxScore = maxScore + maxScoreColumn[i];
    //cout << "minScoreColumn[" << i << "] = " << minScoreColumn[i] << endl;
    //cout << "maxScoreColumn[" << i << "] = " << maxScoreColumn[i] << endl;
  }
  this->scoreRange = maxScore - minScore + 1;

  bestScore = new long long[length];
  worstScore = new long long[length];
  bestScore[length-1] = maxScore;
  worstScore[length-1] = minScore;
  for (int i = length - 2; i >= 0; i--) {
    bestScore[i]  = bestScore[i+1]  - maxScoreColumn[i+1];
    worstScore[i] = worstScore[i+1] - minScoreColumn[i+1];
  }


}




/**
* Computes the pvalue associated with the threshold score requestedScore.
 */
void Matrix::lookForPvalue (long long requestedScore, long long min, long long max, double *pmin, double *pmax) {

  map<long long, double> nbocc = calcDistribWithMapMinMax(min,max); 
  map<long long, double>::iterator iter;


  // computes p values and stores them in nbocc[length] 
  double sum = nbocc[length][max+1];
  long long s = max + 1;
  map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
  while (riter != nbocc[length-1].rend()) {
    sum += riter->second;
    if (riter->first >= requestedScore) s = riter->first;
    nbocc[length][riter->first] = sum;
    riter++;      
  }
  //cout << "   s found : " << s << endl;

  iter = nbocc[length].find(s);
  while (iter != nbocc[length].begin() && iter->first >= s - errorMax) {
    iter--;      
  }
  //cout << "   s - E found : " << iter->first << endl;

#ifdef MEMORYCOUNT
  // for tests, store the number of memory bloc necessary
  for (int pos = 0; pos <= length; pos++) {
    totalMapSize += nbocc[pos].size();
  }
#endif

  *pmax = nbocc[length][s];
  *pmin = iter->second;

}



/**
* Computes the score associated with the pvalue requestedPvalue.
 */
long long Matrix::lookForScore (long long min, long long max, double requestedPvalue, double *rpv, double *rppv) {

  map<long long, double> *nbocc = calcDistribWithMapMinMax(min,max); 
  map<long long, double>::iterator iter;

  // computes p values and stores them in nbocc[length] 
  double sum = 0.0;
  map<long long, double>::reverse_iterator riter = nbocc[length-1].rbegin();
  long long alpha = riter->first+1;
  long long alpha_E = alpha;
  nbocc[length][alpha] = 0.0;
  while (riter != nbocc[length-1].rend()) {
    sum += riter->second;
    nbocc[length][riter->first] = sum;
    if (sum >= requestedPvalue) { 
      break;
    }
    riter++;      
  }
  if (sum > requestedPvalue) {
    alpha_E = riter->first;
    riter--;
    alpha = riter->first; 
  } else {
    if (riter == nbocc[length-1].rend()) { // path following the remark of the mail
      riter--;
      alpha = alpha_E = riter->first;
    } else {
      alpha = riter->first;
      riter++;
      sum += riter->second;
      alpha_E = riter->first;
    }
    nbocc[length][alpha_E] = sum;  
    //cout << "Pv(S) " << riter->first << " " << sum << endl;   
  } 

#ifdef MEMORYCOUNT
  // for tests, store the number of memory bloc necessary
  for (int pos = 0; pos <= length; pos++) {
    totalMapSize += nbocc[pos].size();
  }
#endif

  if (alpha - alpha_E > errorMax) alpha_E = alpha;

  *rpv = nbocc[length][alpha];
  *rppv = nbocc[length][alpha_E];   

  delete[] nbocc;
  return alpha;

}


// computes the distribution of scores between score min and max as the DP algrithm proceeds 
// but instead of using a table we use a map to avoid computations for scores that cannot be reached
map<long long, double> *Matrix::calcDistribWithMapMinMax (long long min, long long max) { 

  // maps for each step of the computation
  // nbocc[length] stores the pvalue
  // nbocc[pos] for pos < length stores the qvalue
  map<long long, double> *nbocc = new map<long long, double> [length+1];
  map<long long, double>::iterator iter;

  long long *maxs = new long long[length+1]; // @ pos i maximum score reachable with the suffix matrix from i to length-1

  maxs[length] = 0;
  for (int i = length-1; i >= 0; i--) {
    maxs[i] = maxs[i+1] + maxScoreColumn[i];
  }

  // initializes the map at position 0
  for (int k = 0; k < 4; k++) {
    if (matInt[k][0]+maxs[1] >= min) {
      nbocc[0][matInt[k][0]] += background[k];
    }
  }

  // computes q values for scores greater or equal than min
  nbocc[length-1][max+1] = 0.0;
  for (int pos = 1; pos < length; pos++) {
    iter = nbocc[pos-1].begin();
    while (iter != nbocc[pos-1].end()) {
      for (int k = 0; k < 4; k++) {
        long long sc = iter->first + matInt[k][pos];
        if (sc+maxs[pos+1] >= min) {
          // the score min can be reached
          if (sc > max) {
            // the score will be greater than max for all suffixes
            nbocc[length-1][max+1] += nbocc[pos-1][iter->first] * background[k]; //pow(4,length-pos-1) ;
            totalOp++;
          } else {              
            nbocc[pos][sc] += nbocc[pos-1][iter->first] * background[k];
            totalOp++;
          }
        } 
      }
      iter++;      
    }      
    //cerr << "        map size for " << pos << " " << nbocc[pos].size() << endl;
  }

  delete[] maxs;

  return nbocc;


}

pytfmpval.i

%module pytfmpval
%{
#include "../src/Matrix.h"
#define SWIG_FILE_WITH_INIT
%}

%include "cpointer.i"
%include "std_string.i"
%include "std_vector.i"
%include "typemaps.i"
%include "../src/Matrix.h"

%pointer_class(double, doublep)
%pointer_class(int, intp)

%nodefaultdtor Matrix;

The c++ functions are called in this python module.

tfmp.py

from __future__ import absolute_import, division, print_function, unicode_literals
import pytfmpval.pytfmpval as tfm
from math import ceil

def read_matrix(matrix, bg=[0.25, 0.25, 0.25, 0.25], mat_type="counts", log_type="nat"):
    """
    From a string of space-delimited counts create a Matrix object.

    Break the string into 4 rows corresponding to A, C, G, and T.
    This function also converts it to a log-odds (position weight) matrix if necessary.

    Args:
        matrix_file (str): White-space delimited string of row-concatenated motif matrix.
        bg (list of floats): Background nucleotide frequencies for [A, C, G, T].
        mat_type (str): Type of motif matrix provided. Options are: "counts", "pfm", "pwm".
            "counts" is for raw count matrices for each base at each position.
            "pfm" is for position frequency matrices (frequencies already calculated).
            "pwm" is for position weight matrices (also referred to as position-specific scoring matrices.)
        log_type (str): Base to use for log. Default is to use the natural log. "log2" is the other option.
            This will affect the scores and p-values.

    Returns:
        m (pytfmpval Matrix): Matrix in pwm format.
    """

    try:
        a, c, g, t = bg[0], bg[1], bg[2], bg[3]
        if (len(matrix.split()) % 4) != 0:
            raise ValueError("Uneven rows in motif matrix. Ensure rows of equal length in input.")

        m = tfm.Matrix(a, c, g, t)
        m.readMatrix(matrix)

        if mat_type.upper() == "COUNTS":
            if log_type.upper() == "NAT":
                m.toLogOddRatio()
            elif log_type.upper() == "LOG2":
                m.toLog2OddRatio()
            else:
                print("Improper log type argument, using natural log.")
                m.toLogOddRatio()

        return m

    except ValueError as error:
        print(repr(error))


def score2pval(matrix, req_score):
    """
    Determine the p-value for a given score for a specific motif PWM.

    Args:
        matrix (pytfmpval Matrix): Matrix in pwm format.
        req_score (float): Requested score for which to determine the p-value.

    Returns:
        ppv (float): The calculated p-value corresponding to the score.
    """

    granularity = 0.1
    max_granularity = 1e-10
    decrgr = 10  # Factor to increase granularity by after each iteration.

    pv = tfm.doublep()
    ppv = tfm.doublep()

    while granularity > max_granularity:
        matrix.computesIntegerMatrix(granularity)
        max_s = int(req_score * matrix.granularity + matrix.offset + matrix.errorMax + 1)
        min_s = int(req_score * matrix.granularity + matrix.offset - matrix.errorMax - 1)
        score = int(req_score * matrix.granularity + matrix.offset)

        matrix.lookForPvalue(score, min_s, max_s, ppv, pv)

        if ppv.value() == pv.value():
            return ppv.value()

        granularity = granularity / decrgr

    print("Max granularity exceeded. Returning closest approximation.")
    return ppv.value()


def pval2score(matrix, pval):
    """
    Determine the score for a given p-value for a specific motif PWM.

    Args:
        matrix (pytfmpval Matrix): Matrix in pwm format.
        pval (float): p-value for which to determine the score.

    Returns:
        score (float): The calculated score corresponding to the p-value.
    """

    init_granularity = 0.1
    max_granularity = 1e-10
    decrgr = 10  # Factor to increase granularity by after each iteration.

    pv = tfm.doublep()  # Initialize as a c++ double.
    ppv = tfm.doublep()
    matrix.computesIntegerMatrix(init_granularity)
    max_s = int(matrix.maxScore + ceil(matrix.errorMax + 0.5))
    min_s = int(matrix.minScore)
    granularity = init_granularity

    while granularity > max_granularity:
        matrix.computesIntegerMatrix(granularity)

        score = matrix.lookForScore(min_s, max_s, pval, pv, ppv)

        min_s = int((score - ceil(matrix.errorMax + 0.5)) * decrgr)
        max_s = int((score + ceil(matrix.errorMax + 0.5)) * decrgr)

        if ppv.value() == pv.value():
            break

        granularity = granularity / decrgr

    if granularity <= max_granularity:
        print("Max granularity exceeded. Returning closest score approximation.")

    final_score = (score - matrix.offset) / matrix.granularity
    return final_score

The script I’ve been using for testing.

thresholds.py

#!/usr/bin/env python
"""
Calculate motif thresholds for a given motif file and print to a new file.

Usage: thresholds.py -m <motifs.txt> -o <output.txt> [OPTIONS]

Args:
    -m (str): Input filename of a file containing PWMs.
    -o (str): Output filename.
    -a (float, optional) <0.25>: Background probability for A nucleotides. If not
        provided then all are assumed to be equally likely (all are 0.25).
    -t (float, optional) <0.25>: Background probability for T nucleotides.
    -c (float, optional) <0.25>: Background probability for C nucleotides.
    -g (float, optional) <0.25>: Background probability for G nucleotides.
    -p (int, optinal) <1>: Processor cores to utilize. Will decrease computation time linearly.
    -pc (float, optional) <0.1>: Pseudocounts value to be added to all positions of the motif frequency matrix
        before calculating the probability matrix.
    -pv (float, optional) <0.00001>: P-value to be used for defining thresholds for each motifs.
        I don't recommend changing this.
    -ow (flag, optional): OverWrite: If present, thresholds already present in the input file will be
        replaced in the output file.
"""

import sys
import argparse
from multiprocessing.dummy import Pool as ThreadPool
import itertools
from pytfmpval import tfmp
from timeit import default_timer as timer

from Bio import motifs

# from memory_profiler import profile
import pdb


# @profile
def find_thresh(combo):
    """
    Calculate the detection thresholds for each pwm matrix given the nucleotide
    background frequencies and desired p-value.

    Args:
        combo (tuple): Three element tuple as defined below.
            matrix (tuple): The PWM for which a detection threshold will be found.
                (motif name, rows of matrix concatenated to single string with spaces between positions).
            pval (float): P-value to which threshold will be calculated.
            bg (list of floats): List containing background nucleotide frequencies [A, C, G, T]
    """

    matrix, pval, bg = (combo[0], combo[1], combo[2])

    start = timer()
    mat = tfmp.read_matrix(matrix[1], bg=bg, mat_type="pwm")
    thresh = tfmp.pval2score(mat, pval)
    pdb.set_trace()
    del mat
    end = timer()

    print(matrix[0] + ": " + str(end - start))

    return (matrix[0], thresh)


def main(motif_file, motif_outfile, pc, bp, ow, pv, p):
    matrices = []
    background = {'A': bp[0], 'C': bp[1], 'G': bp[2], 'T': bp[3]}
    print(("Baseline nucleotide frequencies:nt" + str(background)))

    # Calculate thresholds using pytfmpval.
    print(("Reading in motifs."))
    fh = open(motif_file)
    for m in motifs.parse(fh, "jaspar"):
        pfm = m.counts.normalize(pseudocounts=pc)    # Create frequency matrix.
        pwm = pfm.log_odds(background)              # Calculate to log likelihoods vs background.

        # Create matrix string from motif pwm.
        mat = pwm[0] + pwm[1] + pwm[2] + pwm[3]
        mat = [str(x) for x in mat]
        mat = " ".join(mat)
        matrices.append((m.name, mat))

    fh.close()

    # Multiprocessing to use multiple processing cores.
    thresholds = []
    with ThreadPool(p) as pool:
        for x in pool.imap_unordered(find_thresh, zip(matrices, itertools.repeat(pv),
                                                      itertools.repeat(bp)), chunksize=8):
            thresholds.append(x)

    print(("Total motifs read: " + str(len(thresholds))))
    print("Writing output file.")

    return


if __name__ == '__main__':
    parser = argparse.ArgumentParser(usage=__doc__)

    parser.add_argument("-m", "--motif", dest="motif_file", required=True)
    parser.add_argument("-o", "--outfile", dest="motif_outfile", required=True)
    parser.add_argument("-a", "--a_freq", dest="a_freq", required=False, default=0.25, type=float)
    parser.add_argument("-c", "--c_freq", dest="c_freq", required=False, default=0.25, type=float)
    parser.add_argument("-g", "--g_freq", dest="g_freq", required=False, default=0.25, type=float)
    parser.add_argument("-t", "--t_freq", dest="t_freq", required=False, default=0.25, type=float)
    parser.add_argument("-p", "--processors", dest="processors", required=False, default=1, type=int)
    parser.add_argument("-pc", "--pseudocounts", dest="pseudocounts",
                        required=False, default=0.1, type=float)
    parser.add_argument("-pv", "--pval", dest="p_val",
                        required=False, default=0.00001, type=float)
    parser.add_argument("-ow", "--overwrite", action="store_true", required=False)
    args = parser.parse_args()

    if sum([args.a_freq, args.c_freq, args.g_freq, args.t_freq]) == 1:
        bp = [args.a_freq, args.c_freq, args.g_freq, args.t_freq]
    else:
        print("Background frequencies must equal 1. Check input parameters, exiting.")
        sys.exit()

    main(args.motif_file, args.motif_outfile, args.pseudocounts, bp, args.overwrite, args.p_val, args.processors)

Sample input file can be accessed here.

setup.py can be accessed here.


Setup

To create the swig python module for the Matrix class:
swig -python -c++ pytfmpval.i

To build and install the pytfmpval python module:
pip install .

Python module file structure can be seen here, as can other source code I’ve removed. You’ll likely need the MANIFEST.in for the python module to build properly.

Command to test:
python thresholds.py -m sample_motifs.txt -o sample_motifs.results.txt -ow

That will pop you into the python debugger, and can continue using c. You’ll see that most of the Matrix objects created are processed quickly and use little memory. I’ve included one of the more troublesome ones (ARID3A) in the sample input. It takes much longer (~90s) to run and uses much more memory than the others (~2.7 gb), which is fine and expected. But the memory isn’t ever released, so processing a large file of these quickly becomes an issue.


I worry that nbocc in Matrix.cpp is not being properly dereferenced or deleted. Is this use valid?

I have tried using gc.collect() and I am using the multiprocessing module as recommended in this question to call these functions from my python program. I’ve also tried deleting the Matrix object from within python to no avail.

I’m out of characters, but will provide any additional needed info in the comments as well as I can.


Get this bounty!!!

#StackBounty: #python #google-chrome #anaconda How to make chrome compatible with Jupyter for Anaconda?

Bounty: 50

I am an anaconda user and Jupyter is a neat tool to run python code. However, for my macbook, I can’t open it in Chrome (This page isn’t working
localhost didn’t send any data.),but it works in Safari, I have tried to reinstall chrome, but I still can’t fix it. My system is Mac OS 10.11.5.
Who knows how I can fix it?
I can understand that the problem might be not specific enough, but I have been puzzled by this problem for quite a period of time.


Get this bounty!!!

#StackBounty: #python #amazon-s3 #airflow setting up s3 for logs in airflow

Bounty: 50

I am using docker-compose to set up a scalable airflow cluster. I based my approach off of this Dockerfile https://hub.docker.com/r/puckel/docker-airflow/

My problem is getting the logs set up to write/read from s3. When a dag has completed I get an error like this

*** Log file isn't local.
*** Fetching here: http://ea43d4d49f35:8793/log/xxxxxxx/2017-06-26T11:00:00
*** Failed to fetch log file from worker.

*** Reading remote logs...
Could not read logs from s3://buckets/xxxxxxx/airflow/logs/xxxxxxx/2017-06-
26T11:00:00

I set up a new section in the airflow.cfg file like this

[MyS3Conn]
aws_access_key_id = xxxxxxx
aws_secret_access_key = xxxxxxx
aws_default_region = xxxxxxx

And then specified the s3 path in the remote logs section in airflow.cfg

remote_base_log_folder = s3://buckets/xxxx/airflow/logs
remote_log_conn_id = MyS3Conn

Did I set this up properly and there is a bug? Is there a recipe for success here that I am missing?

update

I tried exporting in URI and JSON formats and neither seemed to work. I then exported the aws_access_key_id and aws_secret_access_key and then airflow started picking it up. Now I get his error in the worker logs

6/30/2017 6:05:59 PMINFO:root:Using connection to: s3
6/30/2017 6:06:00 PMERROR:root:Could not read logs from s3://buckets/xxxxxx/airflow/logs/xxxxx/2017-06-30T23:45:00
6/30/2017 6:06:00 PMERROR:root:Could not write logs to s3://buckets/xxxxxx/airflow/logs/xxxxx/2017-06-30T23:45:00
6/30/2017 6:06:00 PMLogging into: /usr/local/airflow/logs/xxxxx/2017-06-30T23:45:00


Get this bounty!!!

#StackBounty: #python #machine-learning #neural-network #tensorflow Low accuracy of LSTM model tensorflow

Bounty: 50

I am trying to learn LSTM model for sentiment analysis using Tensorflow, I have gone through the LSTM model.

Following code (create_sentiment_featuresets.py) generates the lexicon from 5000 positive sentences and 5000 negative sentences.

import nltk
from nltk.tokenize import word_tokenize
import numpy as np
import random
from collections import Counter
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

def create_lexicon(pos, neg):
    lexicon = []
    with open(pos, 'r') as f:
        contents = f.readlines()
        for l in contents[:len(contents)]:
            l= l.decode('utf-8')
            all_words = word_tokenize(l)
            lexicon += list(all_words)
    f.close()

    with open(neg, 'r') as f:
        contents = f.readlines()    
        for l in contents[:len(contents)]:
            l= l.decode('utf-8')
            all_words = word_tokenize(l)
            lexicon += list(all_words)
    f.close()

    lexicon = [lemmatizer.lemmatize(i) for i in lexicon]
    w_counts = Counter(lexicon)
    l2 = []
    for w in w_counts:
        if 1000 > w_counts[w] > 50:
            l2.append(w)
    print("Lexicon length create_lexicon: ",len(lexicon))
    return l2

def sample_handling(sample, lexicon, classification):
    featureset = []
    print("Lexicon length Sample handling: ",len(lexicon))
    with open(sample, 'r') as f:
        contents = f.readlines()
        for l in contents[:len(contents)]:
            l= l.decode('utf-8')
            current_words = word_tokenize(l.lower())
            current_words= [lemmatizer.lemmatize(i) for i in current_words]
            features = np.zeros(len(lexicon))
            for word in current_words:
                if word.lower() in lexicon:
                    index_value = lexicon.index(word.lower())
                    features[index_value] +=1
            features = list(features)
            featureset.append([features, classification])
    f.close()
    print("Feature SET------")
    print(len(featureset))
    return featureset

def create_feature_sets_and_labels(pos, neg, test_size = 0.1):
    global m_lexicon
    m_lexicon = create_lexicon(pos, neg)
    features = []
    features += sample_handling(pos, m_lexicon, [1,0])
    features += sample_handling(neg, m_lexicon, [0,1])
    random.shuffle(features)
    features = np.array(features)

    testing_size = int(test_size * len(features))

    train_x = list(features[:,0][:-testing_size])
    train_y = list(features[:,1][:-testing_size])
    test_x = list(features[:,0][-testing_size:])
    test_y = list(features[:,1][-testing_size:])
    return train_x, train_y, test_x, test_y

def get_lexicon():
    global m_lexicon
    return m_lexicon

The following code (sentiment_analysis.py) is for sentiment analysis using simple neural network model and is working fine

from create_sentiment_featuresets import create_feature_sets_and_labels
from create_sentiment_featuresets import get_lexicon
import tensorflow as tf
import numpy as np
# extras for testing
from nltk.tokenize import word_tokenize 
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
#- end extras

train_x, train_y, test_x, test_y = create_feature_sets_and_labels('pos.txt', 'neg.txt')


# pt A-------------

n_nodes_hl1 = 1500
n_nodes_hl2 = 1500
n_nodes_hl3 = 1500

n_classes = 2
batch_size = 100
hm_epochs = 10

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)

hidden_1_layer = {'f_fum': n_nodes_hl1,
                'weight': tf.Variable(tf.random_normal([len(train_x[0]), n_nodes_hl1])),
                'bias': tf.Variable(tf.random_normal([n_nodes_hl1]))}
hidden_2_layer = {'f_fum': n_nodes_hl2,
                'weight': tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
                'bias': tf.Variable(tf.random_normal([n_nodes_hl2]))}
hidden_3_layer = {'f_fum': n_nodes_hl3,
                'weight': tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
                'bias': tf.Variable(tf.random_normal([n_nodes_hl3]))}
output_layer = {'f_fum': None,
                'weight': tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
                'bias': tf.Variable(tf.random_normal([n_classes]))}


def nueral_network_model(data):
    l1 = tf.add(tf.matmul(data, hidden_1_layer['weight']), hidden_1_layer['bias'])
    l1 = tf.nn.relu(l1)
    l2 = tf.add(tf.matmul(l1, hidden_2_layer['weight']), hidden_2_layer['bias'])
    l2 = tf.nn.relu(l2)
    l3 = tf.add(tf.matmul(l2, hidden_3_layer['weight']), hidden_3_layer['bias'])
    l3 = tf.nn.relu(l3)
    output = tf.matmul(l3, output_layer['weight']) + output_layer['bias']
    return output

# pt B--------------

def train_neural_network(x):
    prediction = nueral_network_model(x)
    cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits= prediction, labels= y))
    optimizer = tf.train.AdamOptimizer(learning_rate= 0.001).minimize(cost)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for epoch in range(hm_epochs):
            epoch_loss = 0
            i = 0
            while i < len(train_x):
                start = i
                end = i+ batch_size
                batch_x = np.array(train_x[start: end])
                batch_y = np.array(train_y[start: end])
                _, c = sess.run([optimizer, cost], feed_dict= {x: batch_x, y: batch_y})
                epoch_loss += c
                i+= batch_size
            print('Epoch', epoch+ 1, 'completed out of ', hm_epochs, 'loss:', epoch_loss)

        correct= tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
        print('Accuracy:', accuracy.eval({x:test_x, y:test_y}))

        # testing --------------
        m_lexicon= get_lexicon()
        print('Lexicon length: ',len(m_lexicon))        
        input_data= "David likes to go out with Kary"       
        current_words= word_tokenize(input_data.lower())
        current_words = [lemmatizer.lemmatize(i) for i in current_words]
        features = np.zeros(len(m_lexicon))
        for word in current_words:
            if word.lower() in m_lexicon:
                index_value = m_lexicon.index(word.lower())
                features[index_value] +=1

        features = np.array(list(features)).reshape(1,-1)
        print('features length: ',len(features))
        result = sess.run(tf.argmax(prediction.eval(feed_dict={x:features}), 1))
        print(prediction.eval(feed_dict={x:features}))
        if result[0] == 0:
            print('Positive: ', input_data)
        elif result[0] == 1:
            print('Negative: ', input_data)

train_neural_network(x)

I have modified the above (sentiment_analysis.py) for LSTM model
after reading the RNN w/ LSTM cell example in TensorFlow and Python which is for LSTM on mnist image dataset:

Some how through many hit and run trails, I was able to get the below running code (sentiment_demo_lstm.py) :

import tensorflow as tf
from tensorflow.contrib import rnn
from create_sentiment_featuresets import create_feature_sets_and_labels
from create_sentiment_featuresets import get_lexicon

import numpy as np

# extras for testing
from nltk.tokenize import word_tokenize 
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
#- end extras

train_x, train_y, test_x, test_y = create_feature_sets_and_labels('pos.txt', 'neg.txt')

n_steps= 100
input_vec_size= len(train_x[0])
hm_epochs = 8
n_classes = 2
batch_size = 128
n_hidden = 128

x = tf.placeholder('float', [None, input_vec_size, 1])
y = tf.placeholder('float')

def recurrent_neural_network(x):
    layer = {'weights': tf.Variable(tf.random_normal([n_hidden, n_classes])),   # hidden_layer, n_classes
            'biases': tf.Variable(tf.random_normal([n_classes]))}

    h_layer = {'weights': tf.Variable(tf.random_normal([1, n_hidden])), # hidden_layer, n_classes
            'biases': tf.Variable(tf.random_normal([n_hidden], mean = 1.0))}

    x = tf.transpose(x, [1,0,2])
    x = tf.reshape(x, [-1, 1])
    x = tf.split(x, input_vec_size, 0)

    lstm_cell = rnn.BasicLSTMCell(n_hidden, state_is_tuple=True)
    outputs, states = rnn.static_rnn(lstm_cell, x, dtype= tf.float32)
    output = tf.matmul(outputs[-1], layer['weights']) + layer['biases']

    return output

def train_neural_network(x):
    prediction = recurrent_neural_network(x)
    cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits= prediction, labels= y))
    optimizer = tf.train.AdamOptimizer(learning_rate= 0.001).minimize(cost)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        for epoch in range(hm_epochs):
            epoch_loss = 0
            i = 0
            while (i+ batch_size) < len(train_x):
                start = i
                end = i+ batch_size
                batch_x = np.array(train_x[start: end])
                batch_y = np.array(train_y[start: end])
                batch_x = batch_x.reshape(batch_size ,input_vec_size, 1)
                _, c = sess.run([optimizer, cost], feed_dict= {x: batch_x, y: batch_y})
                epoch_loss += c
                i+= batch_size
            print('--------Epoch', epoch+ 1, 'completed out of ', hm_epochs, 'loss:', epoch_loss)

        correct= tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))

        print('Accuracy:', accuracy.eval({x:np.array(test_x).reshape(-1, input_vec_size, 1), y:test_y}))

        # testing --------------
        m_lexicon= get_lexicon()
        print('Lexicon length: ',len(m_lexicon))
        input_data= "Mary does not like pizza"  #"he seems to to be healthy today"  #"David likes to go out with Kary"

        current_words= word_tokenize(input_data.lower())
        current_words = [lemmatizer.lemmatize(i) for i in current_words]
        features = np.zeros(len(m_lexicon))
        for word in current_words:
            if word.lower() in m_lexicon:
                index_value = m_lexicon.index(word.lower())
                features[index_value] +=1
        features = np.array(list(features)).reshape(-1, input_vec_size, 1)
        print('features length: ',len(features))

        result = sess.run(tf.argmax(prediction.eval(feed_dict={x:features}), 1))
        print('RESULT: ', result)
        print(prediction.eval(feed_dict={x:features}))
        if result[0] == 0:
            print('Positive: ', input_data)
        elif result[0] == 1:
            print('Negative: ', input_data)

train_neural_network(x)

Output of

print(train_x[0])
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

print(train_y[0])
[0, 1]

len(train_x)= 9596, len(train_x[0]) = 423 meaning train_x is a list of 9596×423 ?

The above code is running. BUT I am not able to get the accuracy above 50 percent. Its always between 45-50 %

  1. Is there anything wrong in my implementation?

  2. How should I approach to improve the accuracy apart from collecting larger database?

I am new in this field, please help.
Thanks.


Get this bounty!!!

#StackBounty: #python #python-2.7 #random #steganography Python PRNG_Steganography lsb method

Bounty: 50

This is my implementation of a PRNG_Steganography tool for Python. You can also find the code on GitHub.

Imports

from PIL import Image
import numpy as np
import sys
import os
import getopt
import base64
import random
from cryptography.fernet import Fernet
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.backends import default_backend


# Set location of directory we are working in to load/save files
__location__ = os.path.realpath(os.path.join(os.getcwd(), os.path.dirname(__file__)))

Encryption and decryption of the text methods via the Cryptography module

def get_key(password):
    digest = hashes.Hash(hashes.SHA256(), backend=default_backend())
    digest.update(password)
    return base64.urlsafe_b64encode(digest.finalize())


def encrypt_text(password, token):
    f = Fernet(get_key(password))
    return f.encrypt(bytes(token))


def decrypt_text(password, token):
    f = Fernet(get_key(password))
    return f.decrypt(bytes(token))

Main encryption method

def encrypt(filename, text, magic):
    # check whether the text is a file name
    if len(text.split('.')[1:]):
        text = read_files(os.path.join(__location__, text))
    t = [int(x) for x in ''.join(text_ascii(encrypt_text(magic, text)))] + [0]*7  # endbit
    try:
        # Change format to png
        filename = change_image_form(filename)

        # Load Image
        d_old = load_image(filename)

        # Check if image can contain the data
        if d_old.size < len(t):
            print '[*] Image not big enough'
            sys.exit(0)

        # get new data and save to image
        d_new = encrypt_lsb(d_old, magic, t)
        save_image(d_new, 'new_'+filename)
    except Exception, e:
        print str(e)

Main decryption method

def decrypt(filename, magic):
    try:
        # Load image
        data = load_image(filename)

        # Retrieve text
        text = decrypt_lsb(data, magic)
        print '[*] Retrieved text: n%s' % decrypt_text(magic, text)
    except Exception, e:
        print str(e)

Random methods

def text_ascii(text):
    return map(lambda char: '{:07b}'.format(ord(char)), text)


def ascii_text(byte_char):
    return chr(int(byte_char, 2))


def next_random(random_list, data):
    next_random_number = random.randint(0, data.size-1)
    while next_random_number in random_list:
        next_random_number = random.randint(0, data.size-1)
    return next_random_number


def generate_seed(magic):
    seed = 1
    for char in magic:
        seed *= ord(char)
    print '[*] Your magic number is %d' % seed
    return seed

Encrypt via lsb_method

def encrypt_lsb(data, magic, text):
    print '[*] Starting Encryption'

    # We must alter the seed but for now lets make it simple
    random.seed(generate_seed(magic))

    random_list = []
    for i in range(len(text)):
        next_random_number = next_random(random_list, data)
        random_list.append(next_random_number)
        data.flat[next_random_number] = (data.flat[next_random_number] & ~1) | text[i]

    print '[*] Finished Encryption'
    return data

Decrypt via lsb_method

def decrypt_lsb(data, magic):
    print '[*] Starting Decryption'
    random.seed(generate_seed(magic))

    random_list = []
    output = temp_char = ''

    for i in range(data.size):
        next_random_number = next_random(random_list, data)
        random_list.append(next_random_number)
        temp_char += str(data.flat[next_random_number] & 1)
        if len(temp_char) == 7:
            if int(temp_char) > 0:
                output += ascii_text(temp_char)
                temp_char = ''
            else:
                print '[*] Finished Decryption'
                return output

Image handling methods

def load_image(filename):
    img = Image.open(os.path.join(__location__, filename))
    img.load()
    data = np.asarray(img, dtype="int32")
    return data


def save_image(npdata, outfilename):
    img = Image.fromarray(np.asarray(np.clip(npdata, 0, 255), dtype="uint8"), "RGB")
    img.save(os.path.join(__location__, outfilename))


def change_image_form(filename):
    f = filename.split('.')
    if not (f[-1] == 'bmp' or f[-1] == 'BMP' or f[-1] == 'PNG' or f[-1] == 'png'):
        img = Image.open(os.path.join(__location__, filename))
        img.load()
        filename = ''.join(f[:-1]) + '.png'
        img.save(os.path.join(__location__, filename))
    return filename


def read_files(filename):
    if os.path.exists(filename):
        with open(filename, 'r') as f:
            return ''.join([i for i in f])
    return filename.replace(__location__, '')[1:]

Usage and main

def usage():
    print "Steganography prng-Tool @Ludisposed & @Qin"
    print ""
    print "Usage: prng_stego.py -e -m magic filename text "
    print "-e --encrypt              - encrypt filename with text"
    print "-d --decrypt              - decrypt filename"
    print "-m --magic                - encrypt/decrypt with password"
    print ""
    print ""
    print "Examples: "
    print "prng_stego.py -e -m pass test.png howareyou"
    print 'python prng_stego.py -e -m magic test.png tester.sh'
    print 'python prng_stego.py -e -m magic test.png file_test.txt'
    print 'prng_stego.py --encrypt --magic password test.png "howareyou  some other text"'
    print ''
    print "prng_stego.py -d -m password test.png"
    print "prng_stego.py --decrypt --magic password test.png"
    sys.exit(0)

if __name__ == "__main__":
    if not len(sys.argv[1:]):
        usage()
    try:
        opts, args = getopt.getopt(sys.argv[1:], "hedm:", ["help", "encrypt", "decrypt", "magic="])
    except getopt.GetoptError as err:
        print str(err)
        usage()

    magic = to_encrypt = None
    for o, a in opts:
        if o in ("-h", "--help"):
            usage()
        elif o in ("-e", "--encrypt"):
            to_encrypt = True
        elif o in ("-d", "--decrypt"):
            to_encrypt = False
        elif o in ("-m", "--magic"):
            magic = a
        else:
            assert False, "Unhandled Option"

    if magic is None or to_encrypt is None:
        usage()

    if not to_encrypt:
        filename = args[0]
        decrypt(filename, magic)
    else:
        filename = args[0]
        text = args[1]
        encrypt(filename, text, magic) 

Any general Coding tips are welcome!

I would also love some tips on how the use the magic to create a wierd seed.

It works by seeding the random module and getting the next unique random integer with that seed. At decryption it knows when to stop looking for bits if it finds the endbit [0]*7

Furthermore I’m interested in how Random this is? I think this is harder to decrypt then just normal lsb_stego, but can not prove anything.

Kind Regards: Ludisposed


Get this bounty!!!