#StackBounty: #python #sql-server #sqlalchemy SQLAlchemy and SQL Server Datetime field overflow

Bounty: 50

I’m using SQLAlchemy to connect to a SQL Server database.

I’m trying to insert an object into a table from my python script and it’s failing. I’m receiving the error:

(pyodbc.DataError) ('22008', '[22008] [Microsoft][ODBC SQL Server Driver]Datetime field overflow (0) (SQLExecDirectW)')

It looks like this is being caused by the following datetime object:

datetime.datetime(214, 7, 21, 0, 0)

… that’s the 21st July 214

The corresponding date time field in the SQL Server table is of type datetime2.

It looks like the conversion from python/SQLAlchemy to SQL Server isn’t adding a ‘0’ to beginning of the year value. I’ve confirmed this by the fact that I can manually add this date to the SQL Server using an INSERT statement with and without the leading ‘0’.

Is there a way to force the year part of the date into the correct format? Or is this being caused by something else?


Get this bounty!!!

#StackBounty: #python #python-3.x #converting Internal and external datatype converter

Bounty: 200

One of the big problems I have is cleanly converting from an internal datatype to an external datatype. We can all do it the not so clean way, however I think this add too much mess.

I can use libraries that read from a filetype to a Python object, however some libraries don’t allow you to convert the data from one type to another. Most libraries also don’t allow converting from one structure to another, so if you need data to be nested when it comes flat, you have to perform the conversion manually.

This library still uses these filetype libraries to convert to a Python object. It just offloads some of the work from these libraries. And adds some features I don’t think will be added to these libraries.


This library consists of two public classes; Converter and Converters. These work almost completely independently. A short explanation of most of the code is:

  • Converters defines some property functions, these interface with Converter._obj to convert to and from the base class.
  • ron is used to raise when Converter._obj returns a ‘null’ (A BuilderObject), this is as we build a BuilderObject before building the actual class. This allows you to initialize using setattr, rather than just passing a dictionary. Which I find to be a little cleaner at times.
    This should be used whenever you get data from Converter._obj.
  • BuilderObject is a simple object that defaults nested objects to itself. This means we can build nested datatypes without having to build the objects themselves – as we don’t have the data.
  • Converter is a small unobtrusive class to convert to and from the base class and itself. Providing T when using the class is required for the code to work.
from datetime import datetime
from typing import Generic, TypeVar, Type, get_type_hints, Union, List, Optional, Tuple, Any


__all__ = ['ron', 'Converter', 'Converters']


T = TypeVar('T')


class BuilderObject:
    def __init__(self):
        super().__setattr__('__values', {})

    def __getattr__(self, name):
        return super().__getattribute__('__values').setdefault(name, BuilderObject())

    def __setattr__(self, name, value):
        super().__getattribute__('__values')[name] = value

    def __delattr__(self, name):
        del super().__getattribute__('__values')[name]


def _build(base: Type[T], values: Union[BuilderObject, dict]) -> T:
    """Build the object recursively, utilizes the type hints to create the correct types"""
    types = get_type_hints(base)
    if isinstance(values, BuilderObject):
        values = super(BuilderObject, values).__getattribute__('__values')
    for name, value in values.items():
        if isinstance(value, BuilderObject) and name in types:
            values[name] = _build(types[name], value)
    return base(**values)


def _get_args(obj: object, orig: Type) -> Optional[Tuple[Type]]:
    """Get args from obj, filtering by orig type"""
    bases = getattr(type(obj), '__orig_bases__', [])
    for b in bases:
        if b.__origin__ is orig:
            return b.__args__
    return None


class Converter(Generic[T]):
    _obj: T

    def __init__(self, **kwargs) -> None:
        self._obj = BuilderObject()
        for name, value in kwargs.items():
            setattr(self, name, value)

    def build(self, exists_ok: bool=False) -> T:
        """Build base object"""
        t = _get_args(self, Converter)
        if t is None:
            raise ValueError('No base')
        base_cls = t[0]
        if isinstance(self._obj, base_cls):
            if not exists_ok:
                raise TypeError('Base type has been built already.')
            return self._obj
        self._obj = _build(base_cls, self._obj)
        return self._obj

    @classmethod
    def from_(cls, b: T):
        """Build function from base object"""
        c = cls()
        c._obj = b
        return c


def ron(obj: T) -> T:
    """Error on null result"""
    if isinstance(obj, BuilderObject):
        raise AttributeError()
    return obj


TPath = Union[str, List[str]]


class Converters:
    @staticmethod
    def _read_path(path: TPath) -> List[str]:
        """Convert from public path formats to internal one"""
        if isinstance(path, list):
            return path
        return path.split('.')

    @staticmethod
    def _get(obj: Any, path: List[str]) -> Any:
        """Helper for nested `getattr`s"""
        for segment in path:
            obj = getattr(obj, segment)
        return obj

    @classmethod
    def property(cls, path: TPath, *, get_fn=None, set_fn=None):
        """
        Allows getting data to and from `path`.

        You can convert/type check the data using `get_fn` and `set_fn`. Both take and return one value.
        """
        p = ['_obj'] + cls._read_path(path)

        def get(self):
            value = ron(cls._get(self, p))
            if get_fn is not None:
                return get_fn(value)
            return value

        def set(self, value: Any) -> Any:
            if set_fn is not None:
                value = set_fn(value)
            setattr(cls._get(self, p[:-1]), p[-1], value)

        def delete(self: Any) -> Any:
            delattr(cls._get(self, p[:-1]), p[-1])

        return property(get, set, delete)

    @classmethod
    def date(cls, path: TPath, format: str):
        """Convert to and from the date format specified"""
        def get_fn(value: datetime) -> str:
            return value.strftime(format)

        def set_fn(value: str) -> datetime:
            return datetime.strptime(value, format)

        return cls.property(path, get_fn=get_fn, set_fn=set_fn)

An example of using this code is:

from dataclasses import dataclass
from datetime import datetime
from converters import Converter, Converters

from dataclasses_json import dataclass_json


@dataclass
class Range:
    start: datetime
    end: datetime


@dataclass
class Base:
    date: datetime
    range: Range


@dataclass_json
@dataclass(init=False)
class International(Converter[Base]):
    date: str = Converters.date('date', '%d/%m/%y %H:%M')
    start: str = Converters.date('range.start', '%d/%m/%y %H:%M')
    end: str = Converters.date('range.end', '%d/%m/%y %H:%M')


class American(Converter[Base]):
    date: str = Converters.date('date', '%m/%d/%y %H:%M')
    start: str = Converters.date('range.start', '%m/%d/%y %H:%M')
    end: str = Converters.date('range.end', '%m/%d/%y %H:%M')


if __name__ == '__main__':
    i = International.from_json('''{
        "date": "14/02/19 12:00",
        "start": "14/02/19 12:00",
        "end": "14/02/19 12:00"
    }''')
    b = i.build()
    a = American.from_(b)

    FORMAT = '{1}:ntdate: {0.date}ntstart: {0.range.start}ntend: {0.range.end}'
    FORMAT_C = '{1}:ntdate: {0.date}ntstart: {0.start}ntend: {0.end}'
    print(FORMAT.format(b, 'b'))
    print(FORMAT_C.format(a, 'a'))
    print(FORMAT_C.format(i, 'i'))
    print('nupdate b.date')
    b.date = datetime(2019, 2, 14, 12, 30)
    print(FORMAT.format(b, 'b'))
    print(FORMAT_C.format(a, 'a'))
    print(FORMAT_C.format(i, 'i'))
    print('nupdate b.range.start')
    b.range.start = datetime(2019, 2, 14, 13, 00)
    print(FORMAT.format(b, 'b'))
    print(FORMAT_C.format(a, 'a'))
    print(FORMAT_C.format(i, 'i'))

    print('njson dump')
    print(i.to_json())

From a code review I mostly want to focus on increasing the readability of the code. I also want to keep Converter to contain all of the logic, whilst also being very transparent so that most libraries like dataclasses_json work with it. I don’t care about performance just yet.


Get this bounty!!!

#StackBounty: #python #classification #keras Classify big changes in target variable

Bounty: 50

I am using a CNN to predict large changes in my target variable X. I am classifying several “set-up” states visually from my images.

I am only interested in big changes, or maybe no change. So, I can classify in 3 ways:

x is future period absolute change in X

  1. Label BigChange when x > A%. Label Flat when x < A%.
  2. Label BigChange when x > A%. Label Flat when x < B%, where B < A.
  3. Label BigChange when x > A%. Label Flat when x < B%. Label SmallChange when x > B && x < A.

Which of these approaches is likely to give me the best predictive accuracy for BigChange?


Get this bounty!!!

#StackBounty: #python #python-3.x #docker #docker-compose Python multiprocessing crashes docker container

Bounty: 100

There is simple python multiprocessing code that works like a charm, when I run it in console:

# mp.py
import multiprocessing as mp


def do_smth():
    print('something')


if __name__ == '__main__':
    ctx = mp.get_context("spawn")
    p = ctx.Process(target=do_smth, args=tuple())
    p.start()
    p.join()

Result:

> $ python3 mp.py
something

Then I’ve created a simple Docker container with Dockerfile:

FROM python:3.6

ADD . /app
WORKDIR /app

And docker-compose.yml:

version: '3.6'

services:
  bug:
    build:
      context: .
    environment:
      - PYTHONUNBUFFERED=1
    command: su -c "python3.6 forever.py"

Where forever.py is:

from time import sleep

if __name__ == '__main__':
    i = 0
    while True:
        sleep(1.0)
        i += 1
        print(f'hello {i:3}')

Now I run forever.py with docker compose:

> $ docker-compose build && docker-compose up 
...
some output
...
Attaching to mpbug_bug_1
bug_1  | hello   1
bug_1  | hello   2
bug_1  | hello   3
bug_1  | hello   4

Up to this moment everything is good and understandable. But when I’m trying to run mp.py in the docker container it crashes without any message:

> $ docker exec -it mpbug_bug_1 /bin/bash
root@09779ec47f9d:/app# python mp.py 
something
root@09779ec47f9d:/app# % 

Gist with the code can be found here: https://gist.github.com/ilalex/83649bf21ef50cb74a2df5db01686f18

Can you explain why docker container is crashed and how to do it without crashing?

Thank you in advance!


Get this bounty!!!

#StackBounty: #python #interview-questions #selenium Automating A Workflow Using Selenium – Best Way To Fetch Elements & Make Code …

Bounty: 50

I’m a newbie to Selenium and I’m trying to get better at it by taking up interview assignments. This YouTube link describes the workflow that needs to be automated.

The following points are worth taking into account.

  1. For your purpose, use your own gmail id/password but make it readable from a file in your project.

  2. Use the fashion photo pasted below as input.

    enter image description here

I implemented the following solution and I’d like to know if there’s any way to improve it. I’m interested in any changes that can be made, to make the script more robust in addition to any new concepts or tools in Selenium or Python that might make life easier as well as existing concepts and tools that I could implement a whole lot better.

import time,os
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver

try:
    # STEP 1 - LOAD PAGE
    options = webdriver.ChromeOptions()
    options.add_argument("--start-maximized")
    chrome_browser = webdriver.Chrome(chrome_options=options)
    chrome_browser.get("https://huew.co")
    WebDriverWait(chrome_browser,10).until(EC.title_contains("Huew | Your Catalog of Shoppable Fashion Inspirations"))

    # STEP 2 - CLICK ON "YOU"
    you_button = chrome_browser.find_element_by_xpath('//div[@class="desktop-menu-icon-text" and text()="YOU"]')
    time.sleep(2)
    you_button.click()

    # STEP 3 - CLICK ON "LOGIN USING GOOGLE"
    google_login_button = chrome_browser.find_element_by_xpath('//img[@class="social-login-button" and @ng-click="login('google')"]')
    google_login_button.click()

    # STEP 4 - READ USERNAME AND PASSWORD FOR THE GOOGLE ACCOUNT
    with open("credentials.txt", "r") as f:
        lines = f.readlines()
        for line in lines:
            username, password = line.split(":")

    # STEP 5 - SWITCH TO THE NEW WINDOW
    window_handles = chrome_browser.window_handles
    parent_handle = chrome_browser.current_window_handle
    for window_handle in window_handles:
        if window_handle != parent_handle:
            chrome_browser.switch_to.window(window_handle)
            break

    # STEP 6 - TYPE IN USERNAME AND PASSWORD
    input_field = WebDriverWait(chrome_browser, 5).until(EC.visibility_of_element_located((By.ID, "identifierId")))
    input_field.send_keys(username)
    next_button = chrome_browser.find_element_by_xpath('//span[text()="Next"]')
    next_button.click()

    input_field = WebDriverWait(chrome_browser, 5).until(EC.visibility_of_element_located((By.NAME, "password")))
    input_field.send_keys(password)
    next_button = chrome_browser.find_element_by_xpath('//span[text()="Next"]')
    next_button.click()
    time.sleep(5)

    # STEP 7 - SWITCH BACK TO THE OLD WINDOW AND CLICK "DISCOVER"
    chrome_browser.switch_to.window(parent_handle)
    discover_button = chrome_browser.find_element_by_xpath('//div[@class="desktop-menu-icon-text" and text()="DISCOVER"]')
    discover_button.click()

    # STEP 8 - UPLOAD THE PHOTO AND CLICK ON "SUBMIT"
    file_upload_panel = WebDriverWait(chrome_browser, 60).until(lambda chrome_browser: chrome_browser.find_element_by_xpath('//input[@type="file"]'))
    file_upload_panel.send_keys(os.getcwd()+"\fashion_pic.jpg")
    submit_button = WebDriverWait(chrome_browser, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button.desktop-photo-submit.ng-scope")))
    chrome_browser.execute_script('''document.querySelector('button.desktop-photo-submit.ng-scope').click()''')

    # STEP 9 - WAIT FOR THE "SAVE" BUTTON TO APPEAR AND CLICK ON IT
    save_button = WebDriverWait(chrome_browser, 60).until(EC.element_to_be_clickable((By.XPATH, '//button[text()="SAVE"]')))
    save_button.click()
    time.sleep(5)

    # STEP 10 - CLICK ON "HERE" LINK
    here_link = chrome_browser.find_element_by_xpath('//a[text()="here"]')
    here_link.click()

except ValueError:
    print("Please ensure that the input file has the username and password saved in the format specified in readme.txt")


Get this bounty!!!

#StackBounty: #clustering #python Interpreting Jenks Natural breaks results

Bounty: 50

I do not come from a strong math background at all, so even basic concepts confuse me, if this sounds too “basic” sorry.

I was going through a book in bioinformatics about finding positions (nucleotide) within a genome.

I found some positions which I placed in an array and wanted to find any natural “clusters” that appeared to predict the presence of certain structures/features within the genome.

Enough googling lead me to wanting to use “Jenks Natural Breaks” algorithm, which I did, since my data is one dimensional.

I found an implementation here and got the resulting graph from my array:

x = [316, 697, 797, 3355, 36146, 240727, 329885, 338998, 412388, 480358, 
539407, 562037, 745340, 917334, 1000512, 1024954, 1030754, 1038839]

jenks result

My book tells me I am to expect a cluster around the positions 316, 697, 797 , does this graph support that claim strongly? I assume since there is the first breaks on the very bottom it does, but I would like someone to explain it to a layman if possible. Thanks.


Get this bounty!!!

#StackBounty: #python #json #python-3.x #csv #parsing How to read and map CSV's multi line header rows using python

Bounty: 50

I have a CSV file which is downloaded from database(as it is in CSV) and now I have to parse into JSON Schema. Don’t worry this link just github gist

enter image description here

Problem I am facing is: its Multi line Header check CSV File Here

If you take notice in the file:

  1. On 1st line of CSV it has 1st line of headers then next line has
    all the values for those headers.

  2. On 3rd line of CSV file it has 2nd line of headers then next line
    has all the values for those headers.

  3. impOn 5th line of CSV file it has 3rd line of headers then next line
    has all the values for those headers.

Also you can notice the pattern here,

  • 1st line of headers hasn’t any tab
  • 2nd line of headers has only one tab
  • 3rd line of headers has two tabs

This goes for all the records.

Now 1st problem is this multi line of headers.
And 2nd problem is how to parse it into nested json as I have.
one of the solution I have tried Create nested JSON from CSV. and noticed the 1st problem with my csv.

Its been 2 days I am on this, still didn’t got any kind of solution at my end.

My look like this. Where I am only trying to parse initial fields of schema

import csv
import json


def csvParse(csvfile):
    # Open the CSV
    f = open(csvfile, 'r')
    # Change each fieldname to the appropriate field name.
    reader = csv.DictReader(f, fieldnames=("Order Ref", "Order 
Status", "Affiliate", "Source", "Agent", "Customer Name", "Customer Name", "Email 
Address", "Telephone", "Mobile", "Address 1", "Address 2", "City", "County/State",
"Postal Code", "Country", "Voucher Code", " Voucher Amount", "Order Date", "Item ID", 
"Type", "Supplier Code", "Supplier Name", "Booking Ref", "Supplier Price", "Currency", "Selling Price", "Currency", "Depart", "Arrive", "Origin", 
"Destination", "Carrier", "Flight No", "Class", "Pax Type", "Title", 
"Firstname", "Surname", "DOB", "Gender", "FOID Type"))

customer = []
data = []
# data frame names in a list
for row in reader:
    frame = {"orderRef": row["Order Ref"],
             "orderStatus": row["Order Status"],
             "affiliate": row["Affiliate"],
             "source": row["Source"],
             "customers": []}

    data.append(frame)

Json Schema

{


orderRef: number,
  orderStatus: string,
  affiliate: string,
  source: string,
  agent: string,
  customer: {
    name: string,
    email: string,
    telephone: string
    mobile: string,
    address: {
      address1: string,
      address2: string,
      city: string,
      country: string,
      postcode: string,
      country: stringdob
    },
  },
  voucherCode: string,
  voucherAmount: number,
  orderDate: date,
  items:[
    {
      itemId: number,
      type: string,
      supplierCode: string,
      supplierName: string,
      bookingReference: string,
      supplierPrice: float,
      supplierPriceCurrency: string,
      sellingPrice: float,
      sellingPriceCurrency: string,
      legs: [
        {
          departureDate: datetime,
          arrivalDate: datetime, // can be null of not available
          origin: string,
          destination: string,
          carrier: string,
          flightNumber: string,
          class: string
        }
      ],
      passengers: [
        {
          passengerType: string,
          title: string,
          firstName: string,
          surName: string,
          dob: string,
          gender: string,
          foidType: string
        }
      ]
    }
  ]

}


Get this bounty!!!

#StackBounty: #python #beginner #pandas Customer segmentation using RFM analysis

Bounty: 50

Currently, my code works perfectly well but i would like to make it cleaner for other users by removing the duplicate and similar lines of code into functions or for loops. Because I am still new learning Python, I still did not get a hang of functions and for loops. My data frame rfm includes 5 columns:

  • Max Date (latest transaction)
  • Id (unique identifier)
  • Recency (today’s date minus Latest Transaction Date)
  • Frequency (total # of transactions per Id since its subscription)
  • Monetary (total amount of $ spent by Id since its subscription)

Separating the main data frame into 3 different df because the sort differs for each cumulative sum column. Frequency and Monetary dfs have identical calculations:

rfm_recency = rfm[['Max_Date', 'Id', 'Member_id', 'Recency']].copy()
rfm_recency = rfm_recency.sort_values(['Recency'], ascending=True)

rfm_frequency = rfm[['Id', 'Member_id', 'Frequency']].copy()
rfm_frequency = rfm_frequency.sort_values(['Frequency'], ascending=False)
rfm_frequency['cum_sum'] = rfm_frequency['Frequency'].cumsum()
rfm_frequency['cum_sum_perc'] = rfm_frequency['cum_sum'] / rfm_frequency['Frequency'].sum()

rfm_monetary = rfm[['Id', 'Member_id', 'Monetary']].copy()
rfm_monetary = rfm_monetary.sort_values(['Monetary'], ascending=False)
rfm_monetary['cum_sum'] = rfm_monetary['Monetary'].cumsum()
rfm_monetary['cum_sum_perc'] = rfm_monetary['cum_sum'] / rfm_monetary['Monetary'].sum()

def scorefm(x):
    """Function for separating data into 5 bins for Frequency & Monetary df """
    if x <= 0.20:
        return 5
    elif x <= 0.40:
        return 4
    elif x <= 0.60:
        return 3
    elif x <= 0.80:
        return 2
    else:
        return 1


# Divide the Recency df into equal quantiles
rfm_recency['r_score'] = 5 - pd.qcut(rfm_recency['Recency'], q=5, labels=False)

# Create scores from cum_sum_perc for Frequency and Monetary
rfm_frequency['f_score'] = rfm_frequency['cum_sum_perc'].apply(scorefm)
rfm_monetary['m_score'] = rfm_monetary['cum_sum_perc'].apply(scorefm)

# Resorting data frames by ID to merge
rfm_recency = rfm_recency.sort_values('Id')
rfm_frequency = rfm_frequency.sort_values('Id')
rfm_monetary = rfm_monetary.sort_values('Id')

# Merging data frames together
result = rfm_recency.copy(['Recency', 'r_score'])
result = result.join(rfm_frequency[['Frequency', 'f_score']])
result = result.join(rfm_monetary[['Monetary', 'm_score']])

# Create an FM and RFM score based on the individual R, F, M scores.
result['FM'] = (result['f_score'] + result['m_score']) / 2
result['RFM_Score'] = result['r_score'] * 10 + result['FM']


Get this bounty!!!

#StackBounty: #python #scipy #distribution #beta #binomial-theorem Finding alpha and beta of beta-binomial distribution with scipy.opti…

Bounty: 50

A distribution is beta-binomial if p, the probability of success, in a binomial distribution has a beta distribution with shape parameters α > 0 and β > 0. The shape parameters define the probability of success.
I want to find the values for α and β that best describe my data from the perspective of a beta-binomial distribution. My dataset consist of data about the number of hits (H), the number of at-bats (AB) and the conversion (H / AB) of a lot of baseball players. I estimate the PDF with the help of the answer of JulienD in Beta Binomial Function in Python

from scipy.special import beta
from scipy.misc import comb

pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)

Next, I write a loglikelihood function that we will minimize.

def loglike_betabinom(params, *args):
   """
   Negative log likelihood function for betabinomial distribution
   :param params: list for parameters to be fitted.
   :param args:  2-element array containing the sample data.
   :return: negative log-likelihood to be minimized.
   """

   a, b = params[0], params[1]
   k = args[0] # the conversion rate
   n = args[1] # the number of at-bats (AE)

   pdf = comb(n, k) * beta(k + a, n - k + b) / beta(a, b)

   return -1 * np.log(pdf).sum()   

Now, I want to write a function that minimizes loglike_betabinom

 from scipy.optimize import minimize
 init_params = [1, 10]
 res = minimize(loglike_betabinom, x0=init_params, args=(players['H'] / players['AB'], players['AB']), bounds=bounds, method='L-BFGS-B', options={'disp': True, 'maxiter': 250})
 print(res.x)

The result is [-6.04544138 2.03984464], which implies that α is negative which is not possible. I based my script on the following R-snippet. They get [101.359, 287.318]..

 ll <- function(alpha, beta) { 
    x <- career_filtered$H
    total <- career_filtered$AB
    -sum(VGAM::dbetabinom.ab(x, total, alpha, beta, log=True))
 }

 m <- mle(ll, start = list(alpha = 1, beta = 10), 
 method = "L-BFGS-B", lower = c(0.0001, 0.1))

 ab <- coef(m)

Can someone tell me what I am doing wrong? Help is much appreciated!!


Get this bounty!!!