#StackBounty: #python #komodo How to configure 4 spaces per tab in komodo for python?

Bounty: 50

I am using komodo-edit (Komodo Edit, version 8.5.0, build 13638, platform macosx) to write python code. Whenever I press the ‘tab’ key I want the editor to insert 4 spaces. The preference looks like this:

enter image description here

To me this configuration looks fine. But when I press ‘tab’ in the editor, it looks like two 4-space wide tabs are inserted! What is wrong? How to configure komodo-edit in oder it inserts 4 spaces whenever I press the ‘tab’ key?

Note: I am using this on a mac. Maybe its a mac issue…?


Get this bounty!!!

#StackBounty: #r #regression #time-series #multiple-regression #python Multiple time-series symbolic regression

Bounty: 50

I have few columns of technical data as a time-series, let’s say column a, column b and column c. I want to find out the impact of both a and b on c.

If I search for these keywords I find

  • pandas corr function that computes (several) correlation coefficients on (only) two columns.
  • models like ARMA or ARIMA where the first “A” means autoregressive that’s a regression of the same but time-lagged column.

So, what I am looking for is a kind of symbolic regression (or similar) that can compute correlations of several columns of time series on another.


Get this bounty!!!

#StackBounty: #python #mocking #py.test #magicmock Mocking a imported function with pytest

Bounty: 50

I would like to test a email sending method I wrote. In file, format_email.py I import send_email.

 from cars.lib.email import send_email

 class CarEmails(object):

    def __init__(self, email_client, config):
        self.email_client = email_client
        self.config = config

    def send_cars_email(self, recipients, input_payload):

After formatting the email content in send_cars_email() I send the email using the method I imported earlier.

 response_code = send_email(data, self.email_client)

in my test file test_car_emails.py

@pytest.mark.parametrize("test_input,expected_output", test_data)
def test_email_payload_formatting(test_input, expected_output):
    emails = CarsEmails(email_client=MagicMock(), config=config())
    emails.send_email = MagicMock()
    emails.send_cars_email(*test_input)
    emails.send_email.assert_called_with(*expected_output)

When I run the test it fails on assertion not called. I believe The issue is where I am mocking the send_email function.

Where should I be mocking this function?


Get this bounty!!!

#StackBounty: #python #django #amazon-s3 #bower Django bower – use was s3 as storage?

Bounty: 50

im using Django bower (http://django-bower.readthedocs.io/en/latest/installation.html) to manage my jquery plugins.

my settings is currently as such, whereby all media and static go to S3, currently bower goes to root/compomnents. Im not sure on how I could send bower to go to AWS S3 instead?

functions.py

class StaticStorage(S3BotoStorage):
    location = settings.STATICFILES_LOCATION

class MediaStorage(S3BotoStorage):
    location = settings.MEDIAFILES_LOCATION

settings.py

AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME

STATICFILES_LOCATION = 'static'
STATICFILES_STORAGE = 'itapp.functions.StaticStorage'
STATIC_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, STATICFILES_LOCATION)

MEDIAFILES_LOCATION = 'media'
MEDIA_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, MEDIAFILES_LOCATION)
DEFAULT_FILE_STORAGE = 'itapp.functions.MediaStorage'

DOCUMENT_ROOT = '/documents/'

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
    'djangobower.finders.BowerFinder',
)

BOWER_COMPONENTS_ROOT = os.path.join(BASE_DIR, 'components')

BOWER_INSTALLED_APPS = (
    'jquery',
    'jquery-ui',
    'chosen',
)

IMPORTANT INFO FROM THE TOPIC STARTER:

Bower is installing the files to the docker image, when I want them to go into an s3 bucket. all my static files are loaded from the bucket so when it tries to grab a file installed by bower it cannot, because they’re on the docker image insteadw


Get this bounty!!!

#StackBounty: #python #selenium #firefox #proxy #headless Using a http proxy with headless firefox in Selenium webdriver in Python

Bounty: 50

I’m using Firefox headless like this:

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium import webdriver
import os
import sys

# Set the MOZ_HEADLESS environment variable which casues Firefox to
# start in headless mode.
os.environ['MOZ_HEADLESS'] = '1'

# Select your Firefox binary.
binary = FirefoxBinary('/usr/bin/firefox', log_file=sys.stdout)

# Start selenium with the configured binary.
driver = webdriver.Firefox(firefox_binary=binary)

But now I want to add a http proxy that requires a user/password. After searching around, I tried the following:

from selenium.webdriver.common.proxy import Proxy, ProxyType

myProxy = "xx.xx.xx.xx:80"

proxy = Proxy({
    'proxyType': ProxyType.MANUAL,
    'httpProxy': myProxy,
    'ftpProxy': myProxy,
    'sslProxy': myProxy,
    'noProxy': '' # set this value as desired
    })

driver = webdriver.Firefox(firefox_binary=binary, proxy=proxy)

I also tried

profile = webdriver.FirefoxProfile() 
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", "xx.xx.xx.xx")
profile.set_preference("network.proxy.http_port", 80)
profile.update_preferences() 
driver=webdriver.Firefox(firefox_binary=binary,firefox_profile=profile)

Finally, I tried adding “socksUsername” and “socksPassword” with the creds to the proxy, more out of desperation than any real hope.

Needless to say none of these works, and testing shows requests are still using my usual IP, not the proxy.

Also system-wide proxy is not an option in this case.

Where should the http proxy credentials live? How can I use a proxy with headless firefox?

Testing

driver.get("https://www.ipinfo.io");
driver.find_element_by_xpath('//h4/following-sibling::p').text


Get this bounty!!!

#StackBounty: #python #django #postgresql #django-queryset #trigram Django max similarity (TrigramSimilarity) from ManyToManyField

Bounty: 150

I have to implement a search function which will be fault tolerant.
Currently, I have the following situation:

Models:

class Tag(models.Model):
    name = models.CharField(max_length=255)

class Illustration(models.Model):
    name = models.CharField(max_length=255)
    tags = models.ManyToManyField(Tag)

Query:

queryset.annotate(similarity=TrigramSimilarity('name', fulltext) + TrigramSimilarity('tags__name', fulltext))

Example data:

Illustrations:

ID |  Name  |        Tags       |
---|--------|-------------------|
 1 | "Dog"  | "Animal", "Brown" |
 2 | "Cat"  | "Animals"         |

Illustration has Tags:

ID_Illustration | ID_Tag |
----------------|--------|
       1        |    1   |
       1        |    2   |
       2        |    3   |

Tags:

ID_Tag |   Name   |
-------|----------|
   1   |  Animal  |
   2   |  Brown   |
   3   |  Animals |

When I run the query with "Animal", the similarity for "Dog" should be higher than for "Cat", as it is a perfect match.
Unfortunately, both tags are considered together somehow.
Currently, it looks like it’s concatenating the tags in a single string and then checks for similarity:

TrigramSimilarity("Animal Brown", "Animal") => X

But I would like to adjust it in a way that I will get the highest similarity between an Illustration instance name and its tags:

Max([
    TrigramSimilarity('Name', "Animal"), 
    TrigramSimilarity("Tag_1", "Animal"), 
    TrigramSimilarity("Tag_2", "Animal"),
]) => X

Edit1: I’m trying to query all Illustration, where either the title or one of the tags has a similarity bigger than X.

Edit2: Additional example:

fulltext = ‘Animal’

TrigramSimilarity(‘Animal Brown’, fulltext) => x
TrigramSimilarity(‘Animals’, fulltext) => y

Where x < y

But what I want is actually

TrigramSimilarity(Max([‘Animal’, ‘Brown]), fulltext) => x (Similarity
to Animal) TrigramSimilarity(‘Animals’, fulltext) => y

Where x > y


Get this bounty!!!

#StackBounty: #python #python-3.x #datetime What can I do to get local time using datetime.now() and datetime.today()?

Bounty: 200

datetime.now() and datetime.today() return time in UTC on my computer even though the documentation says they should return local time.

Here’s the script I ran:

#!/usr/bin/python

import time
import datetime

if __name__ == "__main__":
   print(datetime.datetime.now())
   print(datetime.datetime.today())
   print(datetime.datetime.fromtimestamp(time.time()))

and here’s the output:

2017-11-29 22:47:35.339914
2017-11-29 22:47:35.340399
2017-11-29 22:47:35.340399

The output of running date right after it is:

Wed, Nov 29, 2017  3:47:43 PM

Why is my installation returning time in UTC?
What can I do get those functions to return local time?

PS We are in MST, which is UTC-7.

PS 2 I realize there are methods to convert a UTC time to local time, such as those explained in Convert a python UTC datetime to a local datetime using only python standard library?. However, I am trying to understand the cause of the fundamental problem and not looking for a method to patch the problem in my own code.


In response to comment by @jwodder:

The output of executing

print(time.altzone)
print(time.timezone)
print(time.tzname)

is:

-3600
0
('Ame', 'ric')


Get this bounty!!!

#StackBounty: #python #performance #beginner #numpy #markov-chain Solve the phase state between two haplotype blocks using markov trans…

Bounty: 50

I have spent about more than a year in python, but I come from biology background. I should say I have got some understanding of for-loop and nested-for-loop to workout the solution to the problem I have had. But, I suppose that there are better and comprehensive way of solving problem.

Below is the code I wrote to solve the haplotype phase state between two haplotype blocks.

Here is a short snipped of my data:

chr  pos       ms01e_PI  ms01e_PG_al  ms02g_PI  ms02g_PG_al  ms03g_PI  ms03g_PG_al  ms04h_PI  ms04h_PG_al  ms05h_PI  ms05h_PG_al  ms06h_PI  ms06h_PG_al
2    15881764  4         C|T          6         C|T          7         T|T          7         T|T          7         C|T          7         C|T
2    15881767  4         C|C          6         T|C          7         C|C          7         C|C          7         T|C          7         C|C
2    15881989  4         C|C          6         A|C          7         C|C          7         C|C          7         A|T          7         A|C
2    15882091  4         G|T          6         G|T          7         T|A          7         A|A          7         A|T          7         A|C
2    15882451  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|A
2    15882454  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|T
2    15882493  4         C|T          4         T|C          7         T|T          7         T|T          7         C|T          7         C|T
2    15882505  4         A|T          4         T|A          7         T|T          7         T|T          7         A|C          7         A|T

Problem:

In the given data I want to solve the phase state of haplotype for the sample ms02g. The suffix PI represents the phased block (i.e all data within that block is phased).

For sample ms02g C-T-A-G is phased (PI level 6), so is T-C-C-T. Similarly data within PI level 4 is also phased. But, since the haplotype is broken in two levels, we don’t know which phase from level-6 goes with which phase of level-4. Phase to be solved - sample ms02g

ms02g_PI  ms02g_PG_al
6         C|T
6         T|C
6         A|C
6         G|T
×——————————×—————> Break Point
4         T|C
4         T|C
4         T|C
4         T|A

Since, all other samples are completely phased I can run a markov-chain transition probabilities to solve the phase state. To the human eye/mind you can clearly see and say that left part of ms02g (level 6, i.e C-T-A-G is more likely to go with right block of level 4 C-C-C-A).

Calculation:

Step 01:

I prepare required haplotype configuration:

  • The top haplotype-PI is block01 and bottom is block-02.
  • The left phased haplotype within each block is Hap-A and the right is Hap-B.

So, there are 2 haplotype configurations:

  • Block01-HapA with Block02-HapA, so B01-HapB with B02-HapB
  • Block01-HapA with Block02-HapB, so B01-HapB with B02-HapA

I count the number of transition for each nucleotide of PI-6 to each nucleotide of PI-4 for each haplotype configuration.
possible transition from PI-6 to PI-4

  Possible
  transitions    ms02g_PG_al
    │       ┌┬┬┬ C|T
    │       ││││ T|C
    │       ││││ A|C
    │       ││││ G|T
    └─────> ││││  ×—————> Break Point
            │││└ T|C
            ││└─ T|C
            │└── T|C
            └─── T|A
  • and multiply the transition probabilities for first nucleotide in PI-6 to all nucleotides of PI-4. Then similarly multiply the transition probability for 2nd nucleotide of PI-6 to all nucleotides in PI-4.

  • then I sum the transition probabilities from one level to another.

enter image description here

  Possible
  transitions    ms02g_PG_al
    │       ┌┬┬┬ C|T 
    │       ││││ T|C | Block 1
    │       ││││ A|C |
    │       ││││ G|T /
    └─────> ││││  ×—————> Break Point
            │││└ T|C 
            ││└─ T|C |
            │└── T|C | Block 2
            └─── T|A /
                 ↓ ↓
             Hap A Hap B

calculating likelyhood

  Block-1-Hap-A with Block-2-Hap-B    vs.    Block-1-Hap-A with Block-2-Hap-A

  C|C × C|C × C|C × A|C = 0.58612            T|C × T|C × T|C × T|C = 0.000244
+ C|T × C|T × C|T × A|T = 0.3164           + T|T × T|T × T|T × T|T = 0.00391
+ C|A × C|A × C|A × A|A = 0.482            + T|A × T|A × T|A × T|A = 0.0007716
+ C|G × C|G × C|G × A|G = 0.3164           + T|G × T|G × T|G × T|G = 0.00391
                          ———————                                    —————————
                    Avg = 0.42523                              Avg = 0.002207

        Likelyhood Ratio = 0.42523 / 0.002207 = 192.67

I have the following working code with python3:
I think the transition probs can be mostly optimized using numpy, or using a markov-chain package. But, I was not able to work them out in the way I desired. Also, it took me sometime to get to what I wanted to do.

Any suggestions with explanation. Thanks.

#!/home/bin/python
from __future__ import division

import collections
from itertools import product
from itertools import islice
import itertools
import csv
from pprint import pprint


''' function to count the number of transitions '''
def sum_Probs(pX_Y, pX):
    try:
        return float(pX_Y / pX)
    except ZeroDivisionError:
        return 0


with open("HaploBlock_Updated.txt", 'w') as f:

    ''' Write header (part 01): This is just a front part of the header. The rear part of the header is
    update dynamically depending upon the number of samples '''
    f.write('t'.join(['chr', 'pos', 'ms02g_PI', 'ms02g_PG_al', 'n']))

    with open('HaploBlock_for_test-forHMMtoy02.txt') as Phased_Data:

        ''' Note: Create a dictionary to store required data. The Dict should contain information about
         two adjacent haplotype blocks that needs extending. In this example I want to extend the
         haplotypes for sample ms02g which has two blocks 6 and 4. So, I read the PI and PG value
         for this sample. Also, data should store with some unique keys. Some keys are: chr, pos,
         sample (PI, PG within sample), possible haplotypes ... etc .. '''


        Phased_Dict = csv.DictReader(Phased_Data, delimiter='t')
        grouped = itertools.groupby(Phased_Dict, key=lambda x: x['ms02g_PI'])


        ''' Function to read the data as blocks (based on PI values of ms02g) at a time'''
        def accumulate(data):
            acc = collections.OrderedDict()
            for d in data:
                for k, v in d.items():
                    acc.setdefault(k, []).append(v)
            return acc


        ''' Store data as keys,values '''
        grouped_data = collections.OrderedDict()
        for k, g in grouped:
            grouped_data[k] = accumulate(g)


        ''' Clear memory '''
        del Phased_Dict
        del grouped

        ''' Iterate over two Hap Blocks at once. This is done to obtain all possible Haplotype
        configurations between two blocks. The (keys,values) for first block is represented as
        k1,v2 and for the later block as k2,v2. '''

        # Empty list for storing haplotypes from each block
        hap_Block1_A = [];
        hap_Block1_B = []
        hap_Block2_A = [];
        hap_Block2_B = []

        # Create empty list for possible haplotype configurations from above block
        hapB1A_hapB2A = []
        hapB1B_hapB2B = []

        ''' list of all available samples (index, value) '''
        sample_list = [('ms01e_PI', 'ms01e_PG_al'), ('ms02g_PI', 'ms02g_PG_al'),
                       ('ms03g_PI', 'ms03g_PG_al'), ('ms04h_PI', 'ms04h_PG_al'),
                       ('ms05h_PI', 'ms05h_PG_al'), ('ms06h_PI', 'ms06h_PG_al')]


        ''' read data from two consecutive blocks at a time '''
        for (k1, v1), (k2, v2) in zip(grouped_data.items(), islice(grouped_data.items(), 1, None)):
            ''' skip if one of the keys has no values'''
            if k1 == '.' or k2 == '.':
                continue


            ''' iterate over the first Haplotype Block, i.e the k1 block.
            The nucleotides in the left of the phased SNPs are called Block01-haplotype-A,
            and similarly on the right as Block01-haplotype-B. '''
            hap_Block1_A = [x.split('|')[0] for x in v1['ms02g_PG_al']] # the left haplotype of Block01
            hap_Block1_B = [x.split('|')[1] for x in v1['ms02g_PG_al']]


            ''' iterate over the second Haplotype Block,
            i.e the k2 block '''
            hap_Block2_A = [x.split('|')[0] for x in v2['ms02g_PG_al']]
            hap_Block2_B = [x.split('|')[1] for x in v2['ms02g_PG_al']]



            ''' Now, we have to start to solve the possible haplotype configuration.
            Possible haplotype Configurations will be, Either :
            1) Block01-haplotype-A phased with Block02-haplotype-A,
                creating -> hapB1A-hapB2A, hapB1B-hapB2B
            Or,
            2) Block01-haplotype-A phased with Block02-haplotype-B
                creating -> hapB1A-hapB2B, hapB1B-hapB2A '''

            ''' First possible configuration '''
            hapB1A_hapB2A = [hap_Block1_A, hap_Block2_A]
            hapB1B_hapB2B = [hap_Block1_B, hap_Block2_B]

            ''' Second Possible Configuration '''
            hapB1A_hapB2B = [hap_Block1_A, hap_Block2_B]
            hapB1B_hapB2A = [hap_Block1_B, hap_Block2_A]


            print('nStarting MarkovChains')
            ''' Now, start preping the first order markov transition matrix
             for the observed haplotypes between two blocks.'''


            ''' To store the sum-values of the product of the transition probabilities.
            These sum are added as the product-of-transition is retured by nested for-loop;
            from "for m in range(....)" '''

            Sum_Of_Product_of_Transition_probabilities_hapB1A_hapB2A = 
                sumOf_PT_hapB1A_B2A = 0
            Sum_Of_Product_of_Transition_probabilities_hapB1B_hapB2B = 
                sumOf_PT_hapB1B_B2B = 0

            Sum_Of_Product_of_Transition_probabilities_hapB1A_hapB2B = 
                sumOf_PT_hapB1A_B2B = 0
            Sum_Of_Product_of_Transition_probabilities_hapB1B_hapB2A = 
                sumOf_PT_hapB1B_B2A = 0


            for n in range(len(v1['ms02g_PI'])):  # n-ranges from block01

                ''' Set the probabilities of each nucleotides at Zero. They are updated for each level
                 of "n" after reading the number of each nucleotides at that position. '''
                pA = 0; pT = 0; pG = 0; pC = 0
                # nucleotide_prob = [0., 0., 0., 0.] # or store as numpy matrix

                # Also storing these values as Dict
                # probably can be improved
                nucleotide_prob_dict = {'A': 0, 'T': 0, 'G': 0, 'C': 0}
                print('prob as Dict: ', nucleotide_prob_dict)


                ''' for storing the product-values of the transition probabilities.
                These are updated for each level of "n" paired with each level of "m". '''
                product_of_transition_Probs_hapB1AB2A = POTP_hapB1AB2A = 1
                product_of_transition_Probs_hapB1BB2B = POTP_hapB1BB2B = 1

                product_of_transition_Probs_hapB1AB2B = POTP_hapB1AB2B = 1
                product_of_transition_Probs_hapB1BB2A = POTP_hapB1BB2A = 1



                ''' Now, we read each level of "m" to compute the transition from each level of "n"
                to each level of "m". '''
                for m in range(len(v2['ms02g_PI'])): # m-ranges from block02
                    # transition for each level of 0-to-n level of V1 to 0-to-m level of V2
                    ''' set transition probabilities at Zero for each possible transition '''
                    pA_A = 0; pA_T = 0; pA_G = 0; pA_C = 0
                    pT_A = 0; pT_T = 0; pT_G = 0; pT_C = 0
                    pG_A = 0; pG_T = 0; pG_G = 0; pG_C = 0
                    pC_A = 0; pC_T = 0; pC_G = 0; pC_C = 0

                    ''' Also, creating an empty dictionary to store transition probabilities
                    - probably upgrade using numpy '''
                    transition_prob_dict = {'A_A' : 0, 'A_T' : 0, 'A_G' : 0, 'A_C' : 0,
                    'T_A' : 0, 'T_T' : 0, 'T_G' : 0, 'T_C' : 0,
                    'G_A' : 0, 'G_T' : 0, 'G_G' : 0, 'G_C' : 0,
                    'C_A' : 0, 'C_T' : 0, 'C_G' : 0, 'C_C' : 0}


                    ''' Now, loop through each sample to compute initial probs and transition probs '''
                    for x, y in sample_list:
                        print('sample x and y:', x,y)


                        ''' Update nucleotide probability for this site
                        - only calculated from v1 and only once for each parse/iteration '''
                        if m == 0:
                            pA += (v1[y][n].count('A'))
                            pT += (v1[y][n].count('T'))
                            pG += (v1[y][n].count('G'))
                            pC += (v1[y][n].count('C'))

                            nucleotide_prob_dict['A'] = pA
                            nucleotide_prob_dict['T'] = pT
                            nucleotide_prob_dict['G'] = pG
                            nucleotide_prob_dict['C'] = pC

                        nucleotide_prob = [pA, pT, pG, pC]


                        ''' Now, update transition matrix '''
                        nucl_B1 = (v1[y][n]).split('|')  # nucleotides at Block01
                        nucl_B2 = (v2[y][m]).split('|')  # nucleotides at Block02


                        ''' create possible haplotype configurations between "n" and "m".
                         If the index (PI value) are same we create zip, if index (PI value) are
                         different we create product. '''

                        HapConfig = []  # create empty list

                        if v1[x][n] == v2[x][m]:
                            ziped_nuclB1B2 = [(x + '_' + y) for (x,y) in zip(nucl_B1, nucl_B2)]
                            HapConfig = ziped_nuclB1B2

                            ''' Move this counting function else where '''
                            pA_A += (HapConfig.count('A_A'))  # (v1[y][0].count('A'))/8
                            pA_T += (HapConfig.count('A_T'))
                            pA_G += (HapConfig.count('A_G'))
                            pA_C += (HapConfig.count('A_C'))

                            pT_A += (HapConfig.count('T_A'))  # (v1[y][0].count('A'))/8
                            pT_T += (HapConfig.count('T_T'))
                            pT_G += (HapConfig.count('T_G'))
                            pT_C += (HapConfig.count('T_C'))

                            pG_A += (HapConfig.count('G_A'))  # (v1[y][0].count('A'))/8
                            pG_T += (HapConfig.count('G_T'))
                            pG_G += (HapConfig.count('G_G'))
                            pG_C += (HapConfig.count('G_C'))

                            pC_A += (HapConfig.count('C_A'))
                            pC_T += (HapConfig.count('C_T'))
                            pC_G += (HapConfig.count('C_G'))
                            pC_C += (HapConfig.count('C_C'))


                        if v1[x][n] != v2[x][m]:
                            prod_nuclB1B2 = [(x + '_' + y) for (x,y) in product(nucl_B1, nucl_B2)]
                            HapConfig = prod_nuclB1B2
                            print('prod NuclB1B2: ', prod_nuclB1B2)

                            pA_A += (HapConfig.count('A_A'))/2
                            pA_T += (HapConfig.count('A_T'))/2
                            pA_G += (HapConfig.count('A_G'))/2
                            pA_C += (HapConfig.count('A_C'))/2

                            pT_A += (HapConfig.count('T_A'))/2
                            pT_T += (HapConfig.count('T_T'))/2
                            pT_G += (HapConfig.count('T_G'))/2
                            pT_C += (HapConfig.count('T_C'))/2

                            pG_A += (HapConfig.count('G_A'))/2
                            pG_T += (HapConfig.count('G_T'))/2
                            pG_G += (HapConfig.count('G_G'))/2
                            pG_C += (HapConfig.count('G_C'))/2

                            pC_A += (HapConfig.count('C_A'))/2
                            pC_T += (HapConfig.count('C_T'))/2
                            pC_G += (HapConfig.count('C_G'))/2
                            pC_C += (HapConfig.count('C_C'))/2


                    ''' Now, compute nucleotide and transition probabilities for each nucleotide
                    from each 0-n to 0-m at each sample. This updates the transition matrix in
                    each loop. **Note: At the end this transition probabilities should sum to 1 '''

                    ''' Storing nucleotide probabilities '''
                    nucleotide_prob = [pA, pT, pG, pC]

                    ''' Storing transition probability as dict'''
                    transition_prob_dict['A_A'] = sum_Probs(pA_A,pA)
                    transition_prob_dict['A_T'] = sum_Probs(pA_T,pA)
                    transition_prob_dict['A_G'] = sum_Probs(pA_G,pA)
                    transition_prob_dict['A_C'] = sum_Probs(pA_C,pA)

                    transition_prob_dict['T_A'] = sum_Probs(pT_A,pT)
                    transition_prob_dict['T_T'] = sum_Probs(pT_T,pT)
                    transition_prob_dict['T_G'] = sum_Probs(pT_G,pT)
                    transition_prob_dict['T_C'] = sum_Probs(pT_C,pT)

                    transition_prob_dict['G_A'] = sum_Probs(pG_A,pG)
                    transition_prob_dict['G_T'] = sum_Probs(pG_T,pG)
                    transition_prob_dict['G_G'] = sum_Probs(pG_G,pG)
                    transition_prob_dict['G_C'] = sum_Probs(pG_C,pG)

                    transition_prob_dict['C_A'] = sum_Probs(pC_A,pC)
                    transition_prob_dict['C_T'] = sum_Probs(pC_T,pC)
                    transition_prob_dict['C_G'] = sum_Probs(pC_G,pC)
                    transition_prob_dict['C_C'] = sum_Probs(pC_C,pC)


                    '''Now, we start solving the haplotype configurations after we have the
                      required data (nucleotide and transition probabilities).
                    Calculate joint probability for each haplotype configuration.
                    Step 01: We calculate the product of the transition from one level
                    of "n" to several levels of "m". '''

                    ''' finding possible configuration between "n" and "m". '''
                    hapB1A_hapB2A_transition = hapB1A_hapB2A[0][n] + '_' + hapB1A_hapB2A[1][m]
                    hapB1B_hapB2B_transition = hapB1B_hapB2B[0][n] + '_' + hapB1B_hapB2B[1][m]

                    hapB1A_hapB2B_transition = hapB1A_hapB2B[0][n] + '_' + hapB1A_hapB2B[1][m]
                    hapB1B_hapB2A_transition = hapB1B_hapB2A[0][n] + '_' + hapB1B_hapB2A[1][m]


                    ''' computing the products of transition probabilities on the for loop '''
                    POTP_hapB1AB2A *= transition_prob_dict[hapB1A_hapB2A_transition]
                    POTP_hapB1BB2B *= transition_prob_dict[hapB1B_hapB2B_transition]

                    POTP_hapB1AB2B *= transition_prob_dict[hapB1A_hapB2B_transition]
                    POTP_hapB1BB2A *= transition_prob_dict[hapB1B_hapB2A_transition]


                ''' Step 02: sum of the product of the transition probabilities '''
                sumOf_PT_hapB1A_B2A += POTP_hapB1AB2A
                sumOf_PT_hapB1B_B2B += POTP_hapB1BB2B

                sumOf_PT_hapB1A_B2B += POTP_hapB1AB2B
                sumOf_PT_hapB1B_B2A += POTP_hapB1BB2A


            ''' Step 03: Now, computing the likely hood of each haplotype configuration '''
            print('computing likelyhood:')

            likelyhood_hapB1A_with_B2A_Vs_B2B = LH_hapB1AwB2AvsB2B = 
                sumOf_PT_hapB1A_B2A / sumOf_PT_hapB1A_B2B
            likelyhood_hapB1B_with_B2B_Vs_B2A = LH_hapB1BwB2BvsB2A = 
                sumOf_PT_hapB1B_B2B / sumOf_PT_hapB1B_B2A

            likelyhood_hapB1A_with_B2B_Vs_B2A = LH_hapB1AwB2BvsB2A = 
                sumOf_PT_hapB1A_B2B / sumOf_PT_hapB1A_B2A
            likelyhood_hapB1B_with_B2A_Vs_B2B = LH_hapB1BwB2AvsB2B = 
                sumOf_PT_hapB1B_B2A / sumOf_PT_hapB1B_B2B


            print('nlikely hood Values: B1A with B2A vs. B2B, B1B with B2B vs. B2A')
            print(LH_hapB1AwB2AvsB2B, LH_hapB1BwB2BvsB2A)

            print('nlikely hood Values: B1A with B2B vs. B2A, B1B with B2A vs. B2B')
            print(LH_hapB1AwB2BvsB2A, LH_hapB1BwB2AvsB2B)


            ''' Update the phase state of the block based on evidence '''
            if (LH_hapB1AwB2AvsB2B + LH_hapB1BwB2BvsB2A) > 
                4*(LH_hapB1AwB2BvsB2A + LH_hapB1BwB2AvsB2B):
                print('Block-1_A is phased with Block-2_A')

                for x in range(len(v1['ms02g_PI'])):
                    PI_value = v1['ms02g_PI'][x]
                    # Note: so, we trasfer the PI value from ealier block to next block

                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                       hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], PI_value,
                                       hapB1A_hapB2A[1][x] + '|' + hapB1B_hapB2B[1][x], 'n']))

            elif (LH_hapB1AwB2AvsB2B + LH_hapB1BwB2BvsB2A) < 
                4 * (LH_hapB1AwB2BvsB2A + LH_hapB1BwB2AvsB2B):
                print('Block-1_A is phased with Block-2_B')

                for x in range(len(v1['ms02g_PI'])):
                    PI_value = v1['ms02g_PI'][x]
                    # Note: so, we trasfer the PI value from ealier block to next block...
                        # but, the phase position are reversed

                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                       hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], PI_value,
                                       hapB1B_hapB2B[1][x] + '|' + hapB1A_hapB2A[1][x], 'n']))


            else:
                print('cannot solve the phase state with confidence - sorry ')
                for x in range(len(v1['ms02g_PI'])):
                    f.write('t'.join([v1['chr'][x], v1['pos'][x], v1['ms02g_PI'][x],
                                   hapB1A_hapB2A[0][x] + '|' + hapB1B_hapB2B[0][x], 'n']))

                for x in range(len(v2['ms02g_PI'])):
                    f.write('t'.join([v2['chr'][x], v2['pos'][x], v2['ms02g_PI'][x],
                                       hapB1A_hapB2A[1][x] + '|' + hapB1B_hapB2B[1][x], 'n']))


Get this bounty!!!

#StackBounty: #python #apache-spark #pyspark #spark-dataframe PySpark casting IntegerTypes to ByteType for optimization

Bounty: 50

I’m reading in a large amount of data via parquet files into dataframes. I noticed a vast amount of the columns either have 1,0,-1 as values and thus could be converted from Ints to Byte types to save memory.

I wrote a function to do just that and return a new dataframe with the values casted as bytes, however when looking at the memory of the dataframe in the UI, I see it saved as just a transformation from the original dataframe and not as a new dataframe itself, thus taking the same amount of memory.

I’m rather new to Spark and may not fully understand the internals, so how would I go about initially setting those columns to be of ByteType?


Get this bounty!!!

#StackBounty: #python #sockets #networking #udp Receiving UDP packets in Python and interpret them

Bounty: 50

I’m quite new to network programming, and I would like to make a little script for a friend of mine.

This should receive udp telemetry packets generated from F1 2017 Codemasters Game.

I searched the forum here

http://forums.codemasters.com/discussion/46726/d-box-and-udp-telemetry-information/p4

and the protocol used in the packets is known.

I managed to get correctly all the float variables, but I got stuck while processing the bytes value.

// Packet size – 1289 bytes

struct UDPPacket

{

    float m_time;

    float m_lapTime;

    float m_lapDistance;

    float m_totalDistance;

    float m_x; // World space position

    float m_y; // World space position

    float m_z; // World space position

    float m_speed; // Speed of car in MPH

    float m_xv; // Velocity in world space

    float m_yv; // Velocity in world space

    float m_zv; // Velocity in world space

    float m_xr; // World space right direction

    float m_yr; // World space right direction

    float m_zr; // World space right direction

    float m_xd; // World space forward direction

    float m_yd; // World space forward direction

    float m_zd; // World space forward direction

    float m_susp_pos[4]; // Note: All wheel arrays have the order:

    float m_susp_vel[4]; // RL, RR, FL, FR

    float m_wheel_speed[4];

    float m_throttle;

    float m_steer;

    float m_brake;

    float m_clutch;

    float m_gear;

    float m_gforce_lat;

    float m_gforce_lon;

    float m_lap;

    float m_engineRate;

    float m_sli_pro_native_support; // SLI Pro support

    float m_car_position; // car race position

    float m_kers_level; // kers energy left

    float m_kers_max_level; // kers maximum energy

    float m_drs; // 0 = off, 1 = on

    float m_traction_control; // 0 (off) - 2 (high)

    float m_anti_lock_brakes; // 0 (off) - 1 (on)

    float m_fuel_in_tank; // current fuel mass

    float m_fuel_capacity; // fuel capacity

    float m_in_pits; // 0 = none, 1 = pitting, 2 = in pit area

    float m_sector; // 0 = sector1, 1 = sector2, 2 = sector3

    float m_sector1_time; // time of sector1 (or 0)

    float m_sector2_time; // time of sector2 (or 0)

    float m_brakes_temp[4]; // brakes temperature (centigrade)

    float m_tyres_pressure[4]; // tyres pressure PSI

    float m_team_info; // team ID 

    float m_total_laps; // total number of laps in this race

    float m_track_size; // track size meters

    float m_last_lap_time; // last lap time

    float m_max_rpm; // cars max RPM, at which point the rev limiter will kick in

    float m_idle_rpm; // cars idle RPM

    float m_max_gears; // maximum number of gears

    float m_sessionType; // 0 = unknown, 1 = practice, 2 = qualifying, 3 = race

    float m_drsAllowed; // 0 = not allowed, 1 = allowed, -1 = invalid / unknown

    float m_track_number; // -1 for unknown, 0-21 for tracks

    float m_vehicleFIAFlags; // -1 = invalid/unknown, 0 = none, 1 = green, 2 = blue, 3 = yellow, 4 = red

    float m_era;                     // era, 2017 (modern) or 1980 (classic)

    float m_engine_temperature;   // engine temperature (centigrade)

    float m_gforce_vert; // vertical g-force component

    float m_ang_vel_x; // angular velocity x-component

    float m_ang_vel_y; // angular velocity y-component

    float m_ang_vel_z; // angular velocity z-component

    byte  m_tyres_temperature[4]; // tyres temperature (centigrade)

    byte  m_tyres_wear[4]; // tyre wear percentage

    byte  m_tyre_compound; // compound of tyre – 0 = ultra soft, 1 = super soft, 2 = soft, 3 = medium, 4 = hard, 5 = inter, 6 = wet

    byte  m_front_brake_bias;         // front brake bias (percentage)

    byte  m_fuel_mix;                 // fuel mix - 0 = lean, 1 = standard, 2 = rich, 3 = max

    byte  m_currentLapInvalid;     // current lap invalid - 0 = valid, 1 = invalid

    byte  m_tyres_damage[4]; // tyre damage (percentage)

    byte  m_front_left_wing_damage; // front left wing damage (percentage)

    byte  m_front_right_wing_damage; // front right wing damage (percentage)

    byte  m_rear_wing_damage; // rear wing damage (percentage)

    byte  m_engine_damage; // engine damage (percentage)

    byte  m_gear_box_damage; // gear box damage (percentage)

    byte  m_exhaust_damage; // exhaust damage (percentage)

    byte  m_pit_limiter_status; // pit limiter status – 0 = off, 1 = on

    byte  m_pit_speed_limit; // pit speed limit in mph

    float m_session_time_left;  // NEW: time left in session in seconds 

    byte  m_rev_lights_percent;  // NEW: rev lights indicator (percentage)

    byte  m_is_spectating;  // NEW: whether the player is spectating

    byte  m_spectator_car_index;  // NEW: index of the car being spectated


    // Car data

    byte  m_num_cars;               // number of cars in data

    byte  m_player_car_index;         // index of player's car in the array

    CarUDPData  m_car_data[20];   // data for all cars on track


    float m_yaw;  // NEW (v1.8)

    float m_pitch;  // NEW (v1.8)

    float m_roll;  // NEW (v1.8)

    float m_x_local_velocity;          // NEW (v1.8) Velocity in local space

    float m_y_local_velocity;          // NEW (v1.8) Velocity in local space

    float m_z_local_velocity;          // NEW (v1.8) Velocity in local space

    float m_susp_acceleration[4];   // NEW (v1.8) RL, RR, FL, FR

    float m_ang_acc_x;                 // NEW (v1.8) angular acceleration x-component

    float m_ang_acc_y;                 // NEW (v1.8) angular acceleration y-component

    float m_ang_acc_z;                 // NEW (v1.8) angular acceleration z-component

};


struct CarUDPData

{

    float m_worldPosition[3]; // world co-ordinates of vehicle

    float m_lastLapTime;

    float m_currentLapTime;

    float m_bestLapTime;

    float m_sector1Time;

    float m_sector2Time;

    float m_lapDistance;

    byte  m_driverId;

    byte  m_teamId;

    byte  m_carPosition;     // UPDATED: track positions of vehicle

    byte  m_currentLapNum;

    byte  m_tyreCompound; // compound of tyre – 0 = ultra soft, 1 = super soft, 2 = soft, 3 = medium, 4 = hard, 5 = inter, 6 = wet

    byte  m_inPits;           // 0 = none, 1 = pitting, 2 = in pit area

    byte  m_sector;           // 0 = sector1, 1 = sector2, 2 = sector3

    byte  m_currentLapInvalid; // current lap invalid - 0 = valid, 1 = invalid

    byte  m_penalties;  // NEW: accumulated time penalties in seconds to be added
};

Also I would like to know how I can access the CarUDPData Struct, I know how to access them in C but, I repeat, I have quite few knowledge of how to manipulate networks stuff.

Thanks indeed.


Get this bounty!!!