#StackBounty: #python #rabbitmq #celery start celery worker and enable it for broadcast queue

Bounty: 50

I’m trying to start celery worker so it only listens to single queue. This is not a problem, I can do this that way:

python -m celery worker -A my_module -Q my_queue -c 1

But now I also want this my_queue queue to be a broadcast queue, so I do this in my celeryconfig:

from kombu.common import Broadcast
CELERY_QUEUES = (Broadcast('my_queue'),)

But as soon as I do this I cannot start my worker anymore, I get error from rabbitmq:

amqp.exceptions.PreconditionFailed: Exchange.declare: (406) PRECONDITION_FAILED - inequivalent arg 'type' for exchange 'my_queue' in vhost 'myvhost': received 'fanout' but current is 'direct'

If I start worker without -Q (but leaving Broadcast in celeryconfig.py as described above) and I list rabbitmq queues I can see broadcast queue is created and named like this:

bcast.43fecba7-786a-461c-a322-620039b29b8b

And similarly if I define this queue within worker (using -Q as mentioned above) or as simple Queue in celeryconfig.py like this:

from kombu import Queue
CELERY_QUEUES = (Queue('my_queue'),)

I can see this queue in rabbitmq like this:

my_queue

It apperas it does not matter what I put into Broadcast call when defining the queue – this seems to be internal celery name, not passed to rabbitmq.

So I’m guessing when worker is starting then my_queue is created and once that’s done it cannot be made Broadcast.

I can have a worker that listens to any queue (not only to my_queue) which I would start by removing the -Q argument. But it would be nice to be able to have a single process that only listens to that particular queue since my tasks I throw in there are fast and I’d like to bring latency down as much as possible.

— Edit 1 —
Spent some time with this problem and it seems bcast queue mentioned above does not appear consistently. After reseting rabbitmq and running celery without -Q option bcast queue did not appear…


Get this bounty!!!

#StackBounty: #python #sockets #ssl Python/sockets/ssl EOF occurred in violation of protocol

Bounty: 150

I would like to authenticate the server at client’s side in my echo client/server program. I’m using python 2.7.12 and the ssl module on

Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

I’ve generated client’s and server’s certificates and keys using the openssl commands:

openssl req -new -x509 -days 365 -nodes -out client.pem -keyout client.key
openssl req -new -x509 -days 365 -nodes -out server.pem -keyout server.key

Versions of openssl library itself and openssl used by python are the same:

openssl version -a
OpenSSL 1.0.1f 6 Jan 2014
built on: Fri Sep 23 12:19:57 UTC 2016
platform: debian-amd64
options:  bn(64,64) rc4(16x,int) des(idx,cisc,16,int) blowfish(idx) 
compiler: cc -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
OPENSSLDIR: "/usr/lib/ssl"

python -c "import ssl; print ssl.OPENSSL_VERSION"
OpenSSL 1.0.1f 6 Jan 2014

However, the code below shows some errors, at server’s side: EOF occurred in violation of protocol (_ssl.c:1645) (but the server still works), and at client’s side:

Traceback (most recent call last):
  File "/http_ssl_client.py", line 36, in <module>
    if not cert or ('commonName', 'test') not in cert['subject'][4]: raise Exception("Invalid SSL cert for host %s. Check if this is a man-in-themiddle attack!" )
Exception: Invalid SSL cert for host %s. Check if this is a man-in-themiddle attack!
{'notBefore': u'Jun  3 11:54:21 2017 GMT', 'serialNumber': u'BBDCBEED69655B6E', 'notAfter': 'Jun  3 11:54:21 2018 GMT', 'version': 3L, 'subject': ((('countryName', u'pl'),), (('stateOrProvinceName', u'test'),), (('localityName', u'test'),), (('organizationName', u'test'),), (('organizationalUnitName', u'test'),), (('commonName', u'test'),), (('emailAddress', u'test'),)), 'issuer': ((('countryName', u'pl'),), (('stateOrProvinceName', u'test'),), (('localityName', u'test'),), (('organizationName', u'test'),), (('organizationalUnitName', u'test'),), (('commonName', u'test'),), (('emailAddress', u'test'),))}

Server’s code:

#!/bin/usr/env python
import socket
import ssl

def main():
    HOST = '127.0.0.1'
    PORT = 1234

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.bind((HOST, PORT))
    sock.listen(5)

    while True:
        conn = None
        client_sock, addr = sock.accept()
        try:
            ssl_client = ssl.wrap_socket(client_sock, server_side=True, certfile="server.pem", keyfile="server.key", ssl_version=ssl.PROTOCOL_TLSv1_2)
            data =  ssl_client.read(1024)
            print data
            ssl_client.write(data)
        except ssl.SSLError as e:
            print(e)
        finally:
            if conn:
                conn.close()
if __name__ == '__main__':
    main()

Client:

#!/bin/usr/env python
import socket
import ssl

if __name__ == '__main__':

    HOST = '127.0.0.1'
    PORT = 1234

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((HOST, PORT))

    context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
    context.verify_mode = ssl.CERT_REQUIRED
    context.load_verify_locations('server.pem')

    if ssl.HAS_SNI:
        secure_sock = context.wrap_socket(sock, server_hostname=HOST)
    else:
        secure_sock = context.wrap_socket(sock)

    cert = secure_sock.getpeercert()
    print cert

    if not cert or ('commonName', 'test') not in cert['subject'][4]: raise Exception("Error" )

    secure_sock.write('hello')

    print secure_sock.read(1024)

    secure_sock.close()
    sock.close()

All files are in the same directory.


Get this bounty!!!

#StackBounty: #python #python-2.7 #numpy #int #long-integer Why does int(maxint) give a long, but int(int(maxint)) give an int? Is this…

Bounty: 50

Pretty self-explanatory (I’m on Windows):

>>> import sys, numpy
>>> a = numpy.int_(sys.maxint)
>>> int(a).__class__
<type 'long'>
>>> int(int(a)).__class__
<type 'int'>

Why does calling int once give me a long, whereas calling it twice gives me an int?

Is this a bug or a feature?


Get this bounty!!!

#StackBounty: #python #youtube-dl Python – youtube-dl force login everytime

Bounty: 50

I want to download multiple files using youtube-dl from a site the needs login.

The issue that I have is that youtube-dl is login for the first video with no issue, but doesn’t login again for next video.

How do I force youtube-dl to login for each video, every time the function is called ? Maybe reset,stop youtube-dl every time.

def video_download(path, url):

        ydl = youtube_dl.YoutubeDL(
            {
                'outtmpl': path + '.mp4',
                'format': 'bestvideo+bestaudio/best',
                'username': email,
                'password': password,
                # 'quiet': True

            })

        with ydl:
            ydl.download([url])
        time.sleep(45)

The function is actually called in a loop, because I set outtmpl for each file/video.

Different instances, the first one is working, the next don’t login again.
I need every time the login to repeat.

<youtube_dl.YoutubeDL.YoutubeDL object at 0x0000000004E74C18>
<youtube_dl.YoutubeDL.YoutubeDL object at 0x00000000032BED68>
WARNING: Unable to download kaltura session JSON: HTTP Error 401: UNAUTHORIZED
<youtube_dl.YoutubeDL.YoutubeDL object at 0x0000000004D6D898>
WARNING: Unable to download kaltura session JSON: HTTP Error 401: UNAUTHORIZED

Terminal/Output:

<youtube_dl.YoutubeDL.YoutubeDL object at 0x00000000050E4C18>
**[safari] Downloading login form
[safari] Login successful**
[safari] 9781787283664/video1_1: Downloading webpage
.............................
<youtube_dl.YoutubeDL.YoutubeDL object at 0x000000000337DD68>
[safari] 9781787283664/video1_2: Downloading webpage
[safari] 9781787283664/video1_2: Downloading kaltura session JSON
WARNING: Unable to download kaltura session JSON: HTTP Error 401: UNAUTHORIZED


Get this bounty!!!

#StackBounty: #python Language/Library to use for unsafe/untrusted code hosted by python

Bounty: 50

I’m looking for a python3 library (or project) that can allow me to run “unsafe” code (input from the internet, or similar) without compromising security.

If such a library doesn’t exist, a recommendation for a programming language that I could “bind” (I’m not sure of the exact term, “host” maybe?) would also be acceptable (though I would like the solution to be as simple as possible).

For clarity, I’m not talking about “hosting” unsafe python code in python, which a few questions I have found here and on SO tell me is impossible. I would also like to avoid creating a DSL, because I want to support a lot of the features of a “modern” high-level programming language (algebraic expressions, list manipulation, class/function definition).

My specific requirements (roughly in order of importance):

  • Ability to pass “complex” data structures between the python host and the unsafe code. Ideally as much complexity as (for example) JSON would be supported, although I could probably manage with a bit less. Only being able to pass basic data types (int, string) would not work well.

  • The language the unsafe code is written in providing support for the language features mentioned above

  • Ability to call functions defined by the unsafe code from python (by name)

  • (Less important) ability for the unsafe code to call a function defined in the python code (safely)

  • (Optional) language for the unsafe code being simple and easy to learn

Some of the things that I’ve looked at are:

  • Python: as mentioned, it seems (nearly?) impossible to safely run unknown/untrusted python code. But I’ve seen some mentions that PyPy offers a restricted sandbox, or the RestrictedPython project. I’m not sure if these are considered safe or if they work for python3.
  • Javascript: via a library like PyV8 or PyExecJs. I haven’t found anything about whether running code this way is actually “safe”, but this seems like it could be a good solution. I’m also not sure if there’s a better maintained equivalent to these.

An answer that what I’m looking for doesn’t exist for python is also reasonable (though a pointer to a language/library solution where this is possible would be nice, in that case).


Get this bounty!!!

#StackBounty: #python #fill-paragraph Incorrect filling of Python docstrings in pep257 mode

Bounty: 50

I have python-fill-docstring-style set to pep-257 (default) and I’m trying to fill docstring paragraphs. However, python mode does not count """ towards the number of characters, nor does it consider indentation, resulting in:

def function(self):
    """a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
    a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
    a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
    """

and

class A:
    def function(self):
        """a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a
        """

Note the incorrectly filled first row. How can I fix this?


Get this bounty!!!

#StackBounty: #python #design-patterns #recursion #backtracking #visitor-pattern A KenKen puzzle/solver in Python

Bounty: 200

I’ve written a simple KenKen puzzle/solver in Python. I’d love some feedback on the design of the puzzle, as well as the architecture for the solver.

To model the puzzle, I have the following classes:

  • Cell is used to model a (row, col, value) tuple
  • Cage (abstract) is used to model a grouping of Cell objects that must collectively satisfy a constraint. From this class, we have the following derived classes:
    • AddCage for cells involved in addition constraints
    • MulCage for cells involved in multiplication constraints
    • SubCage for cells involved in subtraction constraints
    • DivCage for cells involved in division constraints
    • ConCage for constant constraints
    • RowCage for unique row/column constraints
  • Puzzle combines cages, cells, and exposes methods for the unassigned cells, whether or not the puzzle is solved, etc.

Now for the code:

from abc import ABC, abstractmethod
from utils import kk_add, kk_mul, kk_sub, kk_div


class Cell:
    def __init__(self, row, col, value=None):
        """
        Models a cell in a kenken puzzle

        Args:
            row: row
            col: column
            value: cell value

        """

        self.row = row
        self.col = col
        self.value = value

    def __str__(self):
        return '<Cell ({0}, {1}): {2}>'.format(self.row, self.col, self.value)

    def __hash__(self):
        return hash((self.row, self.col))

    def accept(self, visitor):
        """
        Visitor implementation; accept a visitor object
        and call the object's visit method for this object

        Args:
            visitor: `CellVisitor` implementation 

        Returns: None
        """
        visitor.visit_cell(self)


class Cage(ABC):
    def __init__(self, cells, func):
        """
        Base class to model a cage in a kenken puzzle

        A cage is a grouping of cells with a constraint
        that the values of the cells must collectively satisfy

        Args:
            cells: the `Cell` objects in this cage
            func: a predicate used to indicate when the cage is satisfied

        """

        self.cells = set(cells)
        self.func = func

    def __str__(self):
        return '<{0} cells={1}>'.format(self.__class__.__name__, self.cells)

    @property
    def values(self):
        """ 
        Returns: list the cell values list for this cage
        """
        return [cell.value for cell in self.cells]

    @property
    def consistent(self):
        """
        Returns: bool whether or not this cage is consistent
        with respect to its current cell values
        """
        return None in self.values or self.solved

    @property
    def solved(self):
        """
        Returns: bool whether or not this cage is solved
        with respect to its current cell values
        """

        values = self.values
        return (
            None not in values
            and len(values) == len(self.cells)
            and self.evaluate(*values)
        )

    def evaluate(self, *values):
        """
        Evaluate this cage for the given input arguments,
        returning whether or not it's conditions have been satisfied

        Args:
            *values: variate list of test values

        Returns: bool
        """
        return self.func(values)

    @abstractmethod
    def accept(self, visitor):
        """
        Visit this cage. Accept a visitor object and call the
        object's visit method for this object

        Args:
            visitor: `CageVisitor` implementation 

        Returns: None
        """
        pass


class AddCage(Cage):
    def __init__(self, cells, value):
        """
        Models an addition cage in a kenken puzzle

        Args:
            cells: list of `Cell` objects contained in this cage
            value: target value the cell values in this cage must sum to

        """

        self.value = value
        super().__init__(cells, lambda values: kk_add(values, value))

    def accept(self, visitor):
        """
        Visit this cage

        Args:
            visitor: `CageVisitor` object

        Returns: None
        """
        visitor.visit_add(self)


class MulCage(Cage):
    def __init__(self, cells, value):
        """
        Models a multiplication cage in a kenken puzzle

        Args:
            cells: list of `Cell` objects contained in this cage
            value: target value the cell values in this cage must multiply to

        """

        self.value = value
        super().__init__(cells, lambda values: kk_mul(values, value))

    def accept(self, visitor):
        """
        Visit this cage

        Args:
            visitor: `CageVisitor` object

        Returns: None
        """
        visitor.visit_mul(self)


class SubCage(Cage):
    def __init__(self, cells, value):
        """
        Models a subtraction cage in a kenken puzzle

        Args:
            cells: list of `Cell` objects contained in this cage
            value: target value the cell values in this cage must subtract to

        """

        self.value = value
        super().__init__(cells, lambda values: kk_sub(values, value))

    def accept(self, visitor):
        """
        Visit this cage

        Args:
            visitor: `CageVisitor` object

        Returns: None
        """
        visitor.visit_sub(self)


class DivCage(Cage):
    def __init__(self, cells, value):
        """
        Models a division cage in a kenken puzzle

        Args:
            cells: list of `Cell` objects contained in this cage
            value: target value the cell values in this cage must divide to

        """

        self.value = value
        super().__init__(cells, lambda values: kk_div(values, value))

    def accept(self, visitor):
        """
        Visit this cage

        Args:
            visitor: `CageVisitor` object

        Returns: None
        """
        visitor.visit_div(self)


class ConCage(Cage):
    def __init__(self, cell, value):
        """
        Models a constant cage in a kenken puzzle

        Args:
            cell: `Cell` object for this cage
            value: target value

        """

        def func(values):
            return len(values) == 1 and values[0] == value

        self.value = value
        super().__init__([cell], func)

    def accept(self, visitor):
        """
        Visit this cage

        Args:
            visitor: `CageVisitor` object

        Returns: None
        """
        visitor.visit_con(self)


class RowCage(Cage): # RowConstraint
    def __init__(self, cells):
        """
        Models a row constraint in a kenken puzzle

        Here the cell values in this cage must be all unique
        for the cage to be solved

        Args:
            cells: `Cell` objects

        """

        def func(values):
            return len(values) == len(set(values))

        super().__init__(cells, func)

    def accept(self, visitor):
        """
        Visit this cage

        Args:
            visitor: `CageVisitor` object

        Returns: None
        """
        visitor.visit_row(self)


class Puzzle:
    def __init__(self, width, cells, cages):
        """
        Models a kenken puzzle

        See https://en.wikipedia.org/wiki/KenKen
        for more information

        Args:
            width: puzzle size
            cells: `Cell` objects comprising this puzzle
            cages: `Cage` objects a solution for this puzzle must satisfy

        """

        self.width = width
        self.cells = cells
        self.cages = cages

    def __str__(self):
        return '<Puzzle width={0}, cages={1}>'.format(
            self.width, len(self.cages)
        )

    @property
    def domain(self):
        """
        Returns: bool this puzzle's possible cells values
        """
        return range(1, self.width + 1)

    @property
    def unassigned(self):
        """
        Returns: bool this puzzle's unassigned cells
        """
        return (cell for cell in self.cells if cell.value is None)

    @property
    def solved(self):
        """
        Returns: bool whether or not this puzzle has been solved
        """
        return all(cage.solved for cage in self.cages)

    def consistent(self, cell):
        """
        Returns whether or not the value for the given cell is consistent
        with all of its cage constraints

        Args:
            cell: `Cell` object

        Returns: bool

        """

        return all(cage.consistent for cage in self.cages if cell in cage.cells)

For both the Cell and the Cage classes, we have an accept method. This is used to make the objects amenable to the visitor design pattern, for use during solving. The idea is that each cell has a set of “candidate values” that needs to be reduced after we decide to place a value for the cell. I decided to expose things this way to make edits to the core puzzle logic less frequent. Moreover, to try different solution strategies, we need only change the implementation of the visitor we pass to the cells/cages; the core puzzle components need not be changed.

Let’s look at the solver classes:

  • CellVisitor is used to visit cells
  • CageVisitor is used to visit cages; its lifetime is managed by a CellVisitor

And the code:

from utils import with_timing, kk_div, kk_sub


class CellVisitor:
    def __init__(self, candidates, cages):
        """
        Visitor for puzzle cells

        Pass an instance of this object to a puzzle cell
        to "visit" the cell and all the cages that involve
        this cell

        Here we use this object to model the process of eliminating
        a set of candidate values for the given cell

        See https://en.wikipedia.org/wiki/Visitor_pattern
        for more information on this design pattern

        Args:
            candidates: list of cell candidates
            cages: list of cages this visitor should also visit

        """

        self.candidates = candidates
        self.cages = cages

    def __str__(self):
        return '<CellVisitor candidates={0}>'.format(self.candidates)

    def visit_cell(self, cell):
        """
        Visits a `Cell`

        Visit each cage that contains this cell; the resulting
        candidates will be the possible values for this cell

        Args:
            cell: `Cell` object to visit

        Returns: None

        """
        visitor = CageVisitor(self.candidates)
        for cage in self.cages:
            cage.accept(visitor)


class CageVisitor:
    def __init__(self, candidates):
        """
        Visitor for puzzle cages

        The methods in this object are used to prune the cell
        candidate values

        Args:
            candidates: cell candidate values to prune

        """

        self.candidates = candidates

    def __str__(self):
        return '<CageVisitor candidates={0}>'.format(self.candidates)

    def visit_add(self, cage):
        """
        Visits an `AddCage`

        We start with the current cage sum. Any
        value that exceeds the cage target value is pruned

        Args:
            cage: `AddCage` object to visit

        Returns: None

        """
        s = sum(value for value in cage.values if value)
        for value in self.candidates[:]:
            if value + s > cage.value:
                self.prune(value)

    def visit_mul(self, cage):
        """
        Visits a `MulCage`

        Any candidate value that is not a divisor of
        the cage target value is pruned

        Args:
            cage: `MulCage` object to visit

        Returns: None

        """
        for value in self.candidates[:]:
            if cage.value % value != 0:
                self.prune(value)

    def visit_sub(self, cage):
        """
        Visits a `SubCage`

        This implementation removes pairs from the
        candidates if the difference of a given pair
        is not equal to the cage value

        Args:
            cage: `MulCage` object to visit

        Returns: None

        """
        candidates = self.candidates[:]
        for x in candidates:
            if not any(kk_sub([x, y], cage.value) for y in candidates):
                self.prune(x)

    def visit_div(self, cage):
        """
        Visits a `DivCage`

        This implementation removes pairs from the
        candidates if the quotient of a given pair
        is not equal to the cage value

        Args:
            cage: `DivCage` object to visit

        Returns: None

        """
        candidates = self.candidates[:]
        for x in candidates:
            if not any(kk_div([x, y], cage.value) for y in candidates):
                self.prune(x)

    def visit_con(self, cage):
        """
        Visits a `ConCage`

        This implementation removes all candidates
        that are not equal to the cage target value

        Args:
            cage: `ConCage` object to visit

        Returns: None

        """
        for x in self.candidates[:]:
            if x != cage.value:
                self.prune(x)

    def visit_row(self, cage):
        """
        Visits a `RowCage`

        This implementation removes all values
        that are already assigned to a cell in the row

        Args:
            cage: `ConCage` object to visit

        Returns: None

        """
        for value in cage.values:
            self.prune(value)

    def prune(self, value):
        """
        Helper method to safely remove values
        from the candidates

        Args:
            value: to remove

        Returns: None

        """
        if value in self.candidates:
            self.candidates.remove(value)


@with_timing
def backtrack_solve(puzzle):
    """
    Solves a kenken puzzle recursively

    During each iteration of the algorithm, a filtering
    strategy is applied to the puzzle's remaining unassigned cells

    See https://en.wikipedia.org/wiki/Backtracking
    for more information on this algorithm

    Args:
        puzzle: `Puzzle` object to solve

    Returns: bool True if all values in `puzzle` have been assigned a value

    """

    def reduce(cell):
        """
        Reduce the candidate values for this cell

        Args:
            cell: `Cell` object to reduce

        Returns: list of reduced candidates

        """

        candidates = list(puzzle.domain)
        cages = (cage for cage in puzzle.cages if cell in cage.cells)
        cell.accept(CellVisitor(candidates, cages))
        return candidates

    def solve():
        """
        Solve this puzzle recursively

        The algorithm first reduces the candidates for the puzzle's
        unassigned cells

        We then sort the reduced cells by candidate length and
        recursively try values for the current cell until the search
        successfully solves the puzzle

        Returns: bool

        """

        reduced = {cell: reduce(cell) for cell in puzzle.unassigned}

        for cell in sorted(reduced, key=lambda c: len(reduced[c])):
            for value in reduced[cell]:
                cell.value = value

                if puzzle.consistent(cell):
                    if solve():
                        return True

                cell.value = None

            return False
        return puzzle.solved
    return solve()

You can read more about the algorithm in the documentation for the solver. The basic idea is that when we visit a cell, we start off with the puzzle’s full domain. Each of the cages reduces the candidates further, by means of a filtering strategy that is invoked on the candidates when we visit that cage. We do this “reduce” operation for each of the unassigned cells.

Finally, I have a “utils.py” that contains definitions that are in use by the solver and puzzle files. Included is a parse_string method that can be used to create a Puzzle object from a dictionary string:

import time

from ast import literal_eval
from functools import wraps, reduce


def kk_add(values, value):
    """
    Returns whether or not the given values
    sum to the target value

    Args:
        values: list of test values
        value: target value

    Returns: bool

    """
    return sum(values) == value


def kk_mul(values, value):
    """
    Returns whether or not the given values
    multiply to the target value

    Args:
        values: list of test values
        value: target value

    Returns: bool

    """
    return product(values) == value


def kk_sub(values, value):
    """
    Returns whether or not the given values subtract
    to the target value

    Args:
        values: list of test values
        value: target value

    Returns: bool

    """
    return abs(values[0] - values[1]) == value


def kk_div(values, value):
    """
    Returns whether or not the given values divide
    to the target value

    Args:
        values: list of test values
        value: target value

    Returns: bool

    """
    return (int(values[0] / values[1]) == value or
            int(values[1] / values[0]) == value)


def product(nums):
    """
    Helper method to compute the product of a list
    of numbers

    Args:
        nums: list of numbers

    Returns: number

    """
    return reduce(lambda x, y: x * y, nums, 1)


def with_timing(f, output=print):
    """
    Helper method to run a function and output
    the function run time

    Args:
        f: function to decorate
        output: function to output the time message

    Returns: callable decorated function

    """
    @wraps(f)
    def timed(*args, **kwargs):
        ts = time.time()
        result = f(*args, **kwargs)
        te = time.time()

        message = 'func:{!r} args:[{!r}, {!r}] took: {:2.4f} sec'.format(
            f.__name__, args, kwargs, te - ts
        )

        output(message)

        return result
    return timed


def parse_string(s):
    """
    Parse a string to a `Puzzle` object

    The string should be a dictionary that python
    can interpret literally. For example:

    {
      'size': 2,
       'cages': [
         {'value': 2, 'op': '/', 'cells': [(0,0), (0,1)]},
         {'value': 3, 'op': '+', 'cells': [(1,0), (1,1)]}
      ]
    }

    The 'op' should be one of :

        '+' -> AddCage,
        '-' -> SubCage,
        '*' -> MulCage,
        '/' -> DivCage,
        '$' -> ConCage

    The exclusive row/column cages will be created automatically

    Args:
        s: input string to read

    Returns: `Puzzle` object

    """

    from puzzle import (
        Cell,
        AddCage,
        SubCage,
        MulCage,
        DivCage,
        ConCage,
        RowCage,
        Puzzle
    )

    d = literal_eval(s.strip())

    cage_factory = {
        '+': AddCage,
        '-': SubCage,
        '*': MulCage,
        '/': DivCage,
        '$': ConCage
    }

    size = d.get('size')
    cages = d.get('cages')

    if size is None or cages is None:
        raise SyntaxError(
            "Expected 'size' and 'cages'. Got `{0}`".format(d)
        )

    puzzle_cages = []
    puzzle_cells = set()

    for cage in cages:
        value = cage.get('value')
        cells = cage.get('cells')

        if any(cell in puzzle_cells for cell in cells):
            raise ValueError('Some cells exist in another cage {0}'.format(cells))

        if not value or not cells:
            raise SyntaxError(
                "Expected 'value' and 'cells'. Got {0}".format(cage)
            )

        op = cage.get('op')

        if op not in cage_factory:
            raise SyntaxError(
                "Expected '+', '-', '*', '/', '$'. Got {0}".format(op)
            )

        if op == '$' and len(cells) > 1:
            raise ValueError("Expected one cell for `ConstantConstraint`")

        cage_cells = []
        for (row, col) in cells:
            cell = Cell(row, col, None)
            cage_cells.append(cell)

        puzzle_cells = puzzle_cells.union(cage_cells)

        # the constructor for `ConCage` expects a single cell as oppose to a list
        cage = cage_factory[op](cage_cells[0] if op == '$' else cage_cells, value)
        puzzle_cages.append(cage)

    if len(puzzle_cells) != size * size:
        raise Exception(
            'Expected {0} cells; parsed {1}'.format(
                size*size, puzzle_cells)
        )

    for row in range(size):
        cells = [cell for cell in puzzle_cells if cell.row == row]
        puzzle_cages.append(RowCage(cells))

    for col in range(size):
        cells = [cell for cell in puzzle_cells if cell.col == col]
        puzzle_cages.append(RowCage(cells))

    return Puzzle(size, puzzle_cells, puzzle_cages)

Any feedback is welcome. I have some additional puzzle files that I used while debugging/testing the solving algorithm, as well as a “run.py” file which provides a CLI for this application. If you think this is needed, feel free to leave a comment and I can provide a link.


Get this bounty!!!

#StackBounty: #python #daemon How does DAEMON(3) work? Run as background process

Bounty: 50

What are step to make the process detach from the terminal? For that I found man page of daemon() In the description, they mentioned

If nochdir is zero, daemon() changes the process’s current working
directory to the root directory (“/”); otherwise, the current working
directory is left unchanged.

If noclose is zero, daemon() redirects standard input, standard
output and standard error to /dev/null; otherwise, no changes are
made to these file descriptors.

Actually, I was trying to make my python code run as daemon. I found tcollector code here. In that code also they are following same steps like in the description of daemon(). So my question is, why should we do those steps (w.r.t daemonize() in tcollector) like

why change dir to /, umask to 022 and then calling os.setsid(), etc.


Get this bounty!!!

#StackBounty: #python #unit-testing #sqlite3 #twisted #python-db-api Why is Twisted's adbapi failing to recover data from within un…

Bounty: 300

Overview

Context

I am writing unit tests for some higher-order logic that depends on writing to an SQLite3 database. For this I am using twisted.trial.unittest and twisted.enterprise.adbapi.ConnectionPool.

Problem statement

I am able to create a persistent sqlite3 database and store data therein. Using sqlitebrowser, I am able to verify that the data has been persisted as expected.

The issue is that calls to t.e.a.ConnectionPool.run* (e.g.: runQuery) return an empty set of results, but only when called from within a TestCase.

Notes and significant details

The problem I am experiencing occurs only within Twisted’s trial framework. My first attempt at debugging was to pull the database code out of the unit test and place it into an independent test/debug script. Said script works as expected while the unit test code does not (see examples below).

Case 1: misbehaving unit test

init.sql

This is the script used to initialize the database. There are no (apparent) errors stemming from this file.

CREATE TABLE ajxp_changes ( seq INTEGER PRIMARY KEY AUTOINCREMENT, node_id NUMERIC, type TEXT, source TEXT, target TEXT, deleted_md5 TEXT );
CREATE TABLE ajxp_index ( node_id INTEGER PRIMARY KEY AUTOINCREMENT, node_path TEXT, bytesize NUMERIC, md5 TEXT, mtime NUMERIC, stat_result BLOB);
CREATE TABLE ajxp_last_buffer ( id INTEGER PRIMARY KEY AUTOINCREMENT, type TEXT, location TEXT, source TEXT, target TEXT );
CREATE TABLE ajxp_node_status ("node_id" INTEGER PRIMARY KEY  NOT NULL , "status" TEXT NOT NULL  DEFAULT 'NEW', "detail" TEXT);
CREATE TABLE events (id INTEGER PRIMARY KEY AUTOINCREMENT, type text, message text, source text, target text, action text, status text, date text);

CREATE TRIGGER LOG_DELETE AFTER DELETE ON ajxp_index BEGIN INSERT INTO ajxp_changes (node_id,source,target,type,deleted_md5) VALUES (old.node_id, old.node_path, "NULL", "delete", old.md5); END;
CREATE TRIGGER LOG_INSERT AFTER INSERT ON ajxp_index BEGIN INSERT INTO ajxp_changes (node_id,source,target,type) VALUES (new.node_id, "NULL", new.node_path, "create"); END;
CREATE TRIGGER LOG_UPDATE_CONTENT AFTER UPDATE ON "ajxp_index" FOR EACH ROW BEGIN INSERT INTO "ajxp_changes" (node_id,source,target,type) VALUES (new.node_id, old.node_path, new.node_path, CASE WHEN old.node_path = new.node_path THEN "content" ELSE "path" END);END;
CREATE TRIGGER STATUS_DELETE AFTER DELETE ON "ajxp_index" BEGIN DELETE FROM ajxp_node_status WHERE node_id=old.node_id; END;
CREATE TRIGGER STATUS_INSERT AFTER INSERT ON "ajxp_index" BEGIN INSERT INTO ajxp_node_status (node_id) VALUES (new.node_id); END;

CREATE INDEX changes_node_id ON ajxp_changes( node_id );
CREATE INDEX changes_type ON ajxp_changes( type );
CREATE INDEX changes_node_source ON ajxp_changes( source );
CREATE INDEX index_node_id ON ajxp_index( node_id );
CREATE INDEX index_node_path ON ajxp_index( node_path );
CREATE INDEX index_bytesize ON ajxp_index( bytesize );
CREATE INDEX index_md5 ON ajxp_index( md5 );
CREATE INDEX node_status_status ON ajxp_node_status( status );

test_sqlite.py

This is the unit test class that fails unexpectedly. TestStateManagement.test_db_clean passes, indicated that the tables were properly created. TestStateManagement.test_inode_create fails, reporitng that zero results were retrieved.

import os.path as osp

from twisted.internet import defer
from twisted.enterprise import adbapi

import sqlengine # see below

class TestStateManagement(TestCase):

    def setUp(self):
        self.meta = mkdtemp()

        self.db = adbapi.ConnectionPool(
            "sqlite3", osp.join(self.meta, "db.sqlite"), check_same_thread=False,
        )
        self.stateman = sqlengine.StateManager(self.db)

        with open("init.sql") as f:
            script = f.read()

        self.d = self.db.runInteraction(lambda c, s: c.executescript(s), script)

    def tearDown(self):
        self.db.close()
        del self.db
        del self.stateman
        del self.d

        rmtree(self.meta)

    @defer.inlineCallbacks
    def test_db_clean(self):
        """Canary test to ensure that the db is initialized in a blank state"""

        yield self.d  # wait for db to be initialized

        q = "SELECT name FROM sqlite_master WHERE type='table' AND name=?;"
        for table in ("ajxp_index", "ajxp_changes"):
            res = yield self.db.runQuery(q, (table,))
            self.assertTrue(
                len(res) == 1,
                "table {0} does not exist".format(table)
         )

    @defer.inlineCallbacks
    def test_inode_create_file(self):
        yield self.d

        path = osp.join(self.ws, "test.txt")
        with open(path, "wt") as f:
            pass

        inode = mk_dummy_inode(path)
        yield self.stateman.create(inode, directory=False)

        entry = yield self.db.runQuery("SELECT * FROM ajxp_index")
        emsg = "got {0} results, expected 1.  Are canary tests failing?"
        lentry = len(entry)
        self.assertTrue(lentry == 1, emsg.format(lentry))

sqlengine.py

These are the artefacts being tested by the above unit tests.

def values_as_tuple(d, *param):
    """Return the values for each key in `param` as a tuple"""
    return tuple(map(d.get, param))


class StateManager:
    """Manages the SQLite database's state, ensuring that it reflects the state
    of the filesystem.
    """

    log = Logger()

    def __init__(self, db):
        self._db = db

    def create(self, inode, directory=False):
        params = values_as_tuple(
            inode, "node_path", "bytesize", "md5", "mtime", "stat_result"
        )

        directive = (
            "INSERT INTO ajxp_index (node_path,bytesize,md5,mtime,stat_result) "
            "VALUES (?,?,?,?,?);"
        )

        return self._db.runOperation(directive, params)

Case 2: bug disappears outside of twisted.trial

#! /usr/bin/env python

import os.path as osp
from tempfile import mkdtemp

from twisted.enterprise import adbapi
from twisted.internet.task import react
from twisted.internet.defer import inlineCallbacks

INIT_FILE = "example.sql"


def values_as_tuple(d, *param):
    """Return the values for each key in `param` as a tuple"""
    return tuple(map(d.get, param))


def create(db, inode):
    params = values_as_tuple(
        inode, "node_path", "bytesize", "md5", "mtime", "stat_result"
    )

    directive = (
        "INSERT INTO ajxp_index (node_path,bytesize,md5,mtime,stat_result) "
        "VALUES (?,?,?,?,?);"
    )

    return db.runOperation(directive, params)


def init_database(db):
    with open(INIT_FILE) as f:
        script = f.read()

    return db.runInteraction(lambda c, s: c.executescript(s), script)


@react
@inlineCallbacks
def main(reactor):
    meta = mkdtemp()
    db = adbapi.ConnectionPool(
        "sqlite3", osp.join(meta, "db.sqlite"), check_same_thread=False,
    )

    yield init_database(db)

    # Let's make sure the tables were created as expected and that we're
    # starting from a blank slate
    res = yield db.runQuery("SELECT * FROM ajxp_index LIMIT 1")
    assert not res, "database is not empty [ajxp_index]"

    res = yield db.runQuery("SELECT * FROM ajxp_changes LIMIT 1")
    assert not res, "database is not empty [ajxp_changes]"

    # The details of this are not important.  Suffice to say they (should)
    # conform to the DB schema for ajxp_index.
    test_data = {
        "node_path": "/this/is/some/arbitrary/path.ext",
        "bytesize": 0,
        "mtime": 179273.0,
        "stat_result": b"this simulates a blob of raw binary data",
        "md5": "d41d8cd98f00b204e9800998ecf8427e",  # arbitrary
    }

    # store the test data in the ajxp_index table
    yield create(db, test_data)

    # test if the entry exists in the db
    entry = yield db.runQuery("SELECT * FROM ajxp_index")
    assert len(entry) == 1, "got {0} results, expected 1".format(len(entry))

    print("OK")

Closing remarks

Again, upon checking with sqlitebrowser, it seems as though the data is being written to db.sqlite, so this looks like a retrieval problem. From here, I’m sort of stumped… any ideas?


Get this bounty!!!

#StackBounty: #python #performance #regex #natural-language-proc #cython Using lots of regex substitutions to tokenize text

Bounty: 50

I authored a piece of code that was merged into the nltk codebase. It is full of regex substitutions:

import re
from six import text_type

from nltk.tokenize.api import TokenizerI

class ToktokTokenizer(TokenizerI):
    """
    This is a Python port of the tok-tok.pl from
    https://github.com/jonsafari/tok-tok/blob/master/tok-tok.pl

    >>> toktok = ToktokTokenizer()
    >>> text = u'Is 9.5 or 525,600 my favorite number?'
    >>> print (toktok.tokenize(text, return_str=True))
    Is 9.5 or 525,600 my favorite number ?
    >>> text = u'The https://github.com/jonsafari/tok-tok/blob/master/tok-tok.pl is a website with/and/or slashes and sort of weird : things'
    >>> print (toktok.tokenize(text, return_str=True))
    The https://github.com/jonsafari/tok-tok/blob/master/tok-tok.pl is a website with/and/or slashes and sort of weird : things
    >>> text = u'xa1This, is a sentence with weirdxbb symbolsu2026 appearing everywherexbf'
    >>> expected = u'xa1 This , is a sentence with weird xbb symbols u2026 appearing everywhere xbf'
    >>> assert toktok.tokenize(text, return_str=True) == expected
    >>> toktok.tokenize(text) == [u'xa1', u'This', u',', u'is', u'a', u'sentence', u'with', u'weird', u'xbb', u'symbols', u'u2026', u'appearing', u'everywhere', u'xbf']
    True
    """
    # Replace non-breaking spaces with normal spaces.
    NON_BREAKING = re.compile(u"u00A0"), " "

    # Pad some funky punctuation.
    FUNKY_PUNCT_1 = re.compile(u'([،;؛¿!"])}»›”؟¡%٪°±©®।॥…])'), r" 1 "
    # Pad more funky punctuation.
    FUNKY_PUNCT_2 = re.compile(u'([({[“‘„‚«‹「『])'), r" 1 "
    # Pad En dash and em dash
    EN_EM_DASHES = re.compile(u'([–—])'), r" 1 "

    # Replace problematic character with numeric character reference.
    AMPERCENT = re.compile('& '), '&amp; '
    TAB = re.compile('t'), ' 	 '
    PIPE = re.compile('|'), ' | '

    # Pad numbers with commas to keep them from further tokenization. 
    COMMA_IN_NUM = re.compile(r'(?<!,)([,،])(?![,d])'), r' 1 '

    # Just pad problematic (often neurotic) hyphen/single quote, etc.
    PROB_SINGLE_QUOTES = re.compile(r"(['’`])"), r' 1 '
    # Group ` ` stupid quotes ' ' into a single token.
    STUPID_QUOTES_1 = re.compile(r" ` ` "), r" `` "
    STUPID_QUOTES_2 = re.compile(r" ' ' "), r" '' "

    # Don't tokenize period unless it ends the line and that it isn't 
    # preceded by another period, e.g.  
    # "something ..." -> "something ..." 
    # "something." -> "something ." 
    FINAL_PERIOD_1 = re.compile(r"(?<!.).$"), r" ."
    # Don't tokenize period unless it ends the line eg. 
    # " ... stuff." ->  "... stuff ."
    FINAL_PERIOD_2 = re.compile(r"""(?<!.).s*(["'’»›”]) *$"""), r" . 1"

    # Treat continuous commas as fake German,Czech, etc.: „
    MULTI_COMMAS = re.compile(r'(,{2,})'), r' 1 '
    # Treat continuous dashes as fake en-dash, etc.
    MULTI_DASHES = re.compile(r'(-{2,})'), r' 1 '
    # Treat multiple periods as a thing (eg. ellipsis)
    MULTI_DOTS = re.compile(r'(.{2,})'), r' 1 '

    # This is the p{Open_Punctuation} from Perl's perluniprops
    # see http://perldoc.perl.org/perluniprops.html
    OPEN_PUNCT = text_type(u'([{u0f3au0f3cu169bu201au201eu2045u207d'
                            u'u208du2329u2768u276au276cu276eu2770u2772'
                            u'u2774u27c5u27e6u27e8u27eau27ecu27eeu2983'
                            u'u2985u2987u2989u298bu298du298fu2991u2993'
                            u'u2995u2997u29d8u29dau29fcu2e22u2e24u2e26'
                            u'u2e28u3008u300au300cu300eu3010u3014u3016'
                            u'u3018u301au301dufd3eufe17ufe35ufe37ufe39'
                            u'ufe3bufe3dufe3fufe41ufe43ufe47ufe59ufe5b'
                            u'ufe5duff08uff3buff5buff5fuff62')
    # This is the p{Close_Punctuation} from Perl's perluniprops
    CLOSE_PUNCT = text_type(u')]}u0f3bu0f3du169cu2046u207eu208eu232a'
                            u'u2769u276bu276du276fu2771u2773u2775u27c6'
                            u'u27e7u27e9u27ebu27edu27efu2984u2986u2988'
                            u'u298au298cu298eu2990u2992u2994u2996u2998'
                            u'u29d9u29dbu29fdu2e23u2e25u2e27u2e29u3009'
                            u'u300bu300du300fu3011u3015u3017u3019u301b'
                            u'u301eu301fufd3fufe18ufe36ufe38ufe3aufe3c'
                            u'ufe3eufe40ufe42ufe44ufe48ufe5aufe5cufe5e'
                            u'uff09uff3duff5duff60uff63')
    # This is the p{Close_Punctuation} from Perl's perluniprops
    CURRENCY_SYM = text_type(u'$xa2xa3xa4xa5u058fu060bu09f2u09f3u09fb'
                             u'u0af1u0bf9u0e3fu17dbu20a0u20a1u20a2u20a3'
                             u'u20a4u20a5u20a6u20a7u20a8u20a9u20aau20ab'
                             u'u20acu20adu20aeu20afu20b0u20b1u20b2u20b3'
                             u'u20b4u20b5u20b6u20b7u20b8u20b9u20baua838'
                             u'ufdfcufe69uff04uffe0uffe1uffe5uffe6')

    # Pad spaces after opening punctuations.
    OPEN_PUNCT_RE = re.compile(u'([{}])'.format(OPEN_PUNCT)), r'1 '
    # Pad spaces before closing punctuations.
    CLOSE_PUNCT_RE = re.compile(u'([{}])'.format(CLOSE_PUNCT)), r'1 '
    # Pad spaces after currency symbols.
    CURRENCY_SYM_RE = re.compile(u'([{}])'.format(CURRENCY_SYM)), r'1 '

    # Use for tokenizing URL-unfriendly characters: [:/?#]
    URL_FOE_1 = re.compile(r':(?!//)'), r' : ' # in perl s{:(?!//)}{ : }g;
    URL_FOE_2 = re.compile(r'?(?!S)'), r' ? ' # in perl s{?(?!S)}{ ? }g;
    # in perl: m{://} or m{S+.S+/S+} or s{/}{ / }g;
    URL_FOE_3 = re.compile(r'(://)[S+.S+/S+][/]'), ' / '
    URL_FOE_4 = re.compile(r' /'), r' / ' # s{ /}{ / }g;

    # Left/Right strip, i.e. remove heading/trailing spaces.
    # These strip regexes should NOT be used,
    # instead use str.lstrip(), str.rstrip() or str.strip() 
    # (They are kept for reference purposes to the original toktok.pl code)  
    LSTRIP = re.compile(r'^ +'), ''
    RSTRIP = re.compile(r's+$'),'n' 
    # Merge multiple spaces.
    ONE_SPACE = re.compile(r' {2,}'), ' '

    TOKTOK_REGEXES = [NON_BREAKING, FUNKY_PUNCT_1, 
                      URL_FOE_1, URL_FOE_2, URL_FOE_3, URL_FOE_4,
                      AMPERCENT, TAB, PIPE,
                      OPEN_PUNCT_RE, CLOSE_PUNCT_RE, 
                      MULTI_COMMAS, COMMA_IN_NUM, FINAL_PERIOD_2,
                      PROB_SINGLE_QUOTES, STUPID_QUOTES_1, STUPID_QUOTES_2,
                      CURRENCY_SYM_RE, EN_EM_DASHES, MULTI_DASHES, MULTI_DOTS,
                      FINAL_PERIOD_1, FINAL_PERIOD_2, ONE_SPACE]

    def tokenize(self, text, return_str=False):
        text = text_type(text) # Converts input string into unicode.
        for regexp, subsitution in self.TOKTOK_REGEXES:
            text = regexp.sub(subsitution, text)
        # Finally, strips heading and trailing spaces
        # and converts output string into unicode.
        text = text_type(text.strip()) 
        return text if return_str else text.split()

Is there a way to make the subtituition faster? E.g.

  • Combine the chain of regexes into one super regex.
  • Combine some of the regexes
  • Coding it in Cython (but Cython regexes are slow, no?)
  • Running the regex substitution in Julia and wrapping Julia code in Python

The use case for the tokenize() function usually takes a single input but if the same function is called 1,000,000,000 times, it’s rather slow and the GIL is going to lock up the core and process each sentence at a time.

The aim of the question is to ask for ways to speed up a Python code that’s made up of regex substitution, esp. when running the tokenize() function for 1,000,000,000+ times.

If Cython/Julia or any faster language + wrapper is suggested, it would be good if you give an one regex example of how the regex is written in Cython/Julia/Others and the suggestion on how the wrapper would look like.


Get this bounty!!!