#StackBounty: #python #pandas #numpy How to check a sequence is strictly monotonically or there is one turning point where both sides a…

Bounty: 50

Input

l1=[1,3,5,6,7]
l2=[1,2,2,3,4]
l3=[5,4,3,2,1]
l4=[5,5,3,2,1]
l5=[1,2,3,4.1,3,2]
l6=[3,2,1,0.4,1,2,3]
l7=[1,2,10,4,8,9,2]
l8=[1,2,3,4,4,3,2,1]
l9=[-0.05701686,  0.57707936, -0.34602634, -0.02599778]
l10=[ 0.13556905,  0.45859   , -0.34602634, -0.09178798,  0.03044908]
l11=[-0.38643975, -0.09178798,  0.57707936, -0.05701686,  0.00649252]

Notice: The value in sequence is float.

Expected

  • Write a function find_targeted_seq that returns a sequence whether is strictly monotonically or there is one turning point where both sides are strictly monotonically.For example, l1,l3,l5,l6 are expected.

Try


Get this bounty!!!

#StackBounty: #python #arrays #numpy #indexing Replace data of an array by two values of a second array

Bounty: 500

I have two numpy arrays "Elements" and "nodes". My aim is to gather some data of these arrays.
I need to replace "Elements" data of the two last columns by the two coordinates contains
in "nodes" array. The two arrays are very huge, i have to automate it.

This posts refers to an old one: Replace data of an array by 2 values of a second array

with a difference that arrays are very huge (Elements: (3342558,5) and nodes: (581589,4)) and the previous way out does not work.

An example :

import numpy as np

Elements = np.array([[1.,11.,14.],[2.,12.,13.]])

nodes = np.array([[11.,0.,0.],[12.,1.,1.],[13.,2.,2.],[14.,3.,3.]])

results = np.array([[1., 0., 0., 3., 3.],
[2., 1., 1., 2., 2.]])

The previous way out proposed by hpaulj

e = Elements[:,1:].ravel().astype(int)
n=nodes[:,0].astype(int)

I, J = np.where(e==n[:,None])

results = np.zeros((e.shape[0],2),nodes.dtype)
results[J] = nodes[I,:1]
results = results.reshape(2,4)

But with huge arrays, this script does not work:
DepreciationWarning: elementwise comparison failed; this will raise an error in the future


Get this bounty!!!

#StackBounty: #python #numpy #math #pca #voxel Why my PCA is not invariant to rotation and axis swap?

Bounty: 50

I have a voxel (np.array) with size 3x3x3, filled with some values, this setup is essential for me. I want to have rotation-invariant representation of it. For this case, I decided to try PCA representation which is believed to be invariant to orthogonal transformations. another

For simplicity, I took some axes swap, but in case I’m mistaken there can be np.rot90.

I have interpereted my 3d voxels as a set of weighted 3d cube point vectors which I incorrectly called "basis", total 27 (so that is some set of 3d point in space, represented by the vectors, obtained from cube points, scaled by voxel values).

import numpy as np

voxel1 = np.random.normal(size=(3,3,3))
voxel2 =  np.transpose(voxel1, (1,0,2)) #np.rot90(voxel1) #


basis = []
for i in range(3):
    for j in range(3):
        for k in range(3):
            basis.append([i+1, j+1, k+1]) # avoid 0
basis = np.array(basis)


voxel1 = voxel1.reshape((27,1))
voxel2 = voxel2.reshape((27,1))

voxel1 = voxel1*basis # weighted basis vectors
voxel2 = voxel2*basis
print(voxel1.shape)
(27, 3)

Then I did PCA to those 27 3-dimensional vectors:

def pca(x):
    center = np.mean(x, 0)
    x = x - center

    cov = np.cov(x.T) / x.shape[0]

    e_values, e_vectors = np.linalg.eig(cov)

    order = np.argsort(e_values)

    v = e_vectors[:, order].transpose()

    return x.dot(v)

vp1 = pca(voxel1)
vp2 = pca(voxel2)

But the results in vp1 and vp2 are different. Perhaps, I have a mistake (though I beleive this is the right formula), and the proper code must be

x.dot(v.T)

But in this case the results are very strange. The upper and bottom blocks of the transofrmed data are the same up to the sign:

>>> np.abs(np.abs(vp1)-np.abs(vp2)) > 0.01
array([[False, False, False],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True, False,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True, False,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False]])

What I’m doing wrong?

What I want to do is to find some invariant representation of my weighted voxel, something like positioning according to the axes of inertia or principal axes. I would really appreciate if someone helps me.

UPD: Found the question similar to mine, but code is unavailable

EDIT2: Found the code InertiaRotate and managed to monkey-do the following:

import numpy as np

# https://github.com/smparker/orient-molecule/blob/master/orient.py

voxel1 = np.random.normal(size=(3,3,3))
voxel2 =  np.transpose(voxel1, (1,0,2))

voxel1 = voxel1.reshape((27,))
voxel2 = voxel2.reshape((27,))


basis = []
for i in range(3):
    for j in range(3):
        for k in range(3):
            basis.append([i+1, j+1, k+1]) # avoid 0
basis = np.array(basis)
basis = basis - np.mean(basis, axis=0)



def rotate_func(data, mass):

    #mass = [ masses[n.lower()] for n in geom.names ]

    inertial_tensor = -np.einsum("ax,a,ay->xy", data, mass, data)
    # negate sign to reverse the sorting of the tensor
    eig, axes = np.linalg.eigh(-inertial_tensor)
    axes = axes.T

    # adjust sign of axes so third moment moment is positive new in X, and Y axes
    testcoords = np.dot(data, axes.T) # a little wasteful, but fine for now
    thirdmoment = np.einsum("ax,a->x", testcoords**3, mass)

    for i in range(2):
        if thirdmoment[i] < 1.0e-6:
            axes[i,:] *= -1.0

    # rotation matrix must have determinant of 1
    if np.linalg.det(axes) < 0.0:
        axes[2,:] *= -1.0

    return axes

axes1 = rotate_func(basis, voxel1)
v1 = np.dot(basis, axes1.T)
axes2 = rotate_func(basis, voxel2)
v2 = np.dot(basis, axes2.T)


print(v1)
print(v2)

It seems to use basis (coordinates) and mass separately. The results are quite similar to my problem above: some parts of the transformed data match up to the sign, I believe those are some cube sides

print(np.abs(np.abs(v1)-np.abs(v2)) > 0.01)

[[False False False]
 [False False False]
 [False False False]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [False False False]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [False False False]
 [False False False]
 [False False False]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [False False False]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [False False False]
 [False False False]
 [False False False]]

Looking for some explanation. This code is designed for molecules, and must work…

UPD: Tried to choose 3 vectors as a new basis from those 24 – the one with biggest norm, the one with the smallest and their cross product. Combined them into the matrix V, then used the formula V^(-1)*X to transform coordinates, and got the same problem – the resulting sets of vectors are not equal for rotated voxels.


Get this bounty!!!

#StackBounty: #python #pandas #numpy #docx Get data frame in shape of table in word document

Bounty: 50

I am reading an excel file, extracting a specific df and putting it in word document. The issues I face are:

  1. DF loses its shape once added to para. becomes totally useless.

Complete code is written below.

#importing required libraries
import pandas as pd
import numpy as np
eod = pd.read_excel('df.xlsx')
import datetime
import docx 
from datetime import date
legal = docx.Document('legal.docx')

#Calculating No. days from SCN
eod['SCN Days'] = (pd.Timestamp('now').floor('d') - eod['SCN Date']).dt.days

#Generation list of EFE for Final Showcause Notice to be issued today
FSCN_today = eod.where(eod['SCN Days']>20)
#Dropping Null from generated list
FSCN_today = FSCN_today.dropna(how ="all")
FSCN_today = FSCN_today[['Exporter Name','EFE','DESTINATION','VALUE']]

#Getting Unique Values in the list generated
s_values = FSCN_today['Exporter Name'].unique()

#Iterating through List
for c in s_values:
    df1 = FSCN_today[FSCN_today['Exporter Name'] == c]
    legal.paragraphs[7].text = c
    legal.paragraphs[8].text = df1.iloc[10:1]
    legal.paragraphs[15].text = str(df1)
    notice_name = str(c)+ ".docx"
    legal.save(notice_name)

#Update Date & Status of FSCN Issued today
eod['FSCN Date'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),date.today(),eod['FSCN Date'])
eod['Status'] = np.where((eod['Status']=="SCN ISSUED") & (eod['SCN Days']>20),"FSCN ISSUED",eod['Status'])

#In progress
name = "EOD "+ str(date.today())+ ".xlsx"
#eod.to_excel(name,index =False)  

Following line have error.

legal.paragraphs[15].text = str(df1)


Get this bounty!!!

#StackBounty: #keras #tensorflow #numpy Load numpy data from directory to keras image generator

Bounty: 100

I have two folders of hyperspectral data with five channels which are converted to numpy array. Each folder depicts the respective label.

Example :

dataset  
----good_data  
      ----good_image_01.npy  
      ----good_image_02.npy  
----bad_data  
    ----bad_image_01.npy  
    ----bad_image_02.npy

I have used keras image generator to feed the data to input pipeline previously with png images.

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

But I am not sure how to achieve the same with numpy dataset. can anyone help?


Get this bounty!!!

#StackBounty: #neural-networks #python #rnn #backpropagation #numpy Basic RNN sequence classifier diagram?

Bounty: 50

I’d like to build an RNN in numpy from scratch to really get come comfortable with backpropagation through time (BPTT.) In the below diagram and LaTeX, I show two neurons, each with a non-linearity, N(i,j) and softmax/hidden state layer H(i,j).

The first neuron will receive x1, which will be sent to the non-linearities N1 and N2 (see equations 8 and 9 on left below); then, the N1 and N2 outputs will be sent to a softmax layer (see equations 6 and 7.)

In the following step, x2 will be sent to the second neuron; likewise, the h(1,1) and h(1,2) hidden state outputs will be sent as additional inputs to the second neuron. The nonlinearities will act upon these inputs (see eq 4 and 5) then be delivered to the final softmax layer, h(2,1) and h(2,2) (see eq 2 and 3.)

Lastly, argmax is applied to these hidden states and a predicted y value is returned (which is to say, the sequence label.)

RNN sequence classifier

Because I want to implement the above from scratch, I will need to derive the gradients here. But before I move onto that step, I would like to know for certain that:
(A) The diagram respects a valid RNN, which can label sequences. and (B) that the equations on the left accurately depict what the diagram details.

To answer this equation, either confirm A and B (or if necessary, please provide guidance on what needs altering to achieve the stated effect.)

Edit: I took a stab at the gradients, however, I’m not sure of their accuracy.

Gradients


Get this bounty!!!

#StackBounty: #python #numpy #pandas #matplotlib #jupyter Jupyter notebook style help + code suggestions for pandas

Bounty: 50

I wanted to open source some code to scrape and analyze publicly-filed stock buys and sells from U.S. senators. I’m not familiar with code style for Jupyter notebooks or pandas in general. Would it be possible for you to review my short notebook? The original can be found here.

Ideally I would get the scraping code reviewed as well, but for the sake of brevity I wanted to keep it to just the pandas and Jupyter notebook-related changes. Within scope is things like how the Jupyter notebook should be structured, general Python code style, and pandas conventions + optimizations.

I know that this is a larger ask, so I am also open to just high-level suggestions.

I have included the contents of the Jupyter notebook below. (I thought about removing the # In[ ]: comments, but realized they indicate where each Jupyter cell begins.) Thank you in advance!

# # Senator Filings Analysis

# ***

# ## Imports

# In[ ]:

from collections import defaultdict
import datetime as dt
from functools import lru_cache
import json
from os import path
import pickle

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import yfinance as yf


# ## Introduction
# 
# In this notebook, we explore stock orders that were publicly filed by U.S. senators. The filings are scraped from https://efdsearch.senate.gov/search/. We calculate the returns of each senator by mimicking their buys and sells.

# ***

# ## Loading data
# 
# The `senators.pickle` file is scraped using the script in the root of the repository.

# In[ ]:

with open('senators.pickle', 'rb') as f:
    raw_senators_tx = pickle.load(f)


# ## Data cleaning

# ### Filling in missing tickers

# In this section, we fill in as many of the missing ticker symbols as we can.

# In[ ]:

def tokenize(asset_name):
    """ Convert an asset name into useful tokens. """
    token_string = asset_name
        .replace('(', '')
        .replace(')', '')
        .replace('-', ' ')
        .replace('.', '')
    return token_string.split(' ')

def token_is_ticker(token, token_blacklist):
    return len(token) <= 4 and token.upper() not in token_blacklist

# These generic words do not help us determine the ticker
with open('blacklist.json', 'r') as f:
    blacklist = set(json.load(f))

missing_tickers = set(raw_senators_tx[
    (raw_senators_tx['ticker'] == '--')
    | (raw_senators_tx['ticker'] == '')
]['asset_name'])

ticker_map = {}
unmapped_tickers = set()
for m in missing_tickers:
    tokens = tokenize(m)
    if token_is_ticker(tokens[0], blacklist):
        ticker_map[m] = tokens[0].upper()
    elif token_is_ticker(tokens[-1], blacklist):
        ticker_map[m] = tokens[-1].upper()
    else:
        unmapped_tickers.add(m)


# As a second pass, we assign tickers to asset names that have any of the specified keywords.

# In[ ]:

phrase_to_ticker = {
    'FOX': 'FOX',
    'AMAZON': 'AMZN',
    'AARON': 'AAN',
    'ALTRIA': 'MO',
    'APPLE': 'AAPL',
    'CHEVRON': 'CVX',
    'DUPONT': 'DD',
    'ALPHABET': 'GOOGL',
    'GOOG': 'GOOGL',
    'GENERAL ELECTRIC': 'GE',
    'JOHNSON': 'JNJ',
    'NEWELL': 'NWL',
    'OWENS': 'OMI',
    'PFIZER': 'PFE',
    'TYSON': 'TSN',
    'UNDER ARMOUR': 'UAA',
    'VERIZON': 'VZ',
    'WALT': 'DIS'
}

for m in unmapped_tickers:
    for t in phrase_to_ticker:
        if t in m.upper():
            ticker_map[m] = phrase_to_ticker[t]

tx_with_tickers = raw_senators_tx.copy()
for a, t in ticker_map.items():
    tx_with_tickers.loc[tx_with_tickers['asset_name'] == a, 'ticker'] = t


# ### Filtering rows and columns

# We filter out useless rows and missing symbols, and then add some useful columns for the final dataset.

# In[ ]:

filtered_tx = tx_with_tickers[tx_with_tickers['ticker'] != '--']
filtered_tx = filtered_tx.assign(
    ticker=filtered_tx['ticker'].map(
        lambda s: s.replace('--', '').replace('n', '')))

filtered_tx = filtered_tx[filtered_tx['order_type'] != 'Exchange']


# In[ ]:

def parse_tx_amount(amt):
    """ Get the lower bound for the transaction amount. """
    return int(amt.replace('Over $50,000,000', '50000000')
               .split(' - ')[0]
               .replace(',', '')
               .replace('$', ''))

senators_tx = filtered_tx.assign(
    tx_estimate=filtered_tx['tx_amount'].map(parse_tx_amount))
senators_tx = senators_tx.assign(
    full_name=senators_tx['first_name']
        .str
        .cat(senators_tx['last_name'], sep=' ')
)
useful_cols = [
    'file_date',
    'tx_date',
    'full_name',
    'order_type',
    'ticker',
    'tx_estimate'
]
senators_tx = senators_tx[useful_cols]
senators_tx = senators_tx.assign(
    tx_date=senators_tx['tx_date'].map(
        lambda v: dt.datetime.strptime(v, '%m/%d/%Y')))
senators_tx = senators_tx.assign(
    file_date=senators_tx['file_date'].map(
        lambda v: dt.datetime.strptime(v, '%m/%d/%Y')))
senators_tx


# ## Returns calculation

# These cells help us download the market data for the specified tickers. We store the market data in files so we don't need to repeatedly download the same information.

# In[ ]:

def download_for_ticker(ticker, check_cache=True):
    """ Download a file of stock prices for this ticker to disk. """
    if check_cache and path.exists('stocks/{0}.pickle'.format(ticker)):
        return
    d = yf.Ticker(ticker)
    with open('stocks/{0}.pickle'.format(ticker), 'wb') as f:
        pickle.dump({
            'price': d.history(period='max').reset_index()
        }, f)

def load_for_ticker(ticker):
    """ Load the file of stock prices for this ticker. """
    with open('stocks/{0}.pickle'.format(ticker), 'rb') as f:
        dump = pickle.load(f)
    raw = dump['price']
    return raw[['Date', 'Close']]
        .rename(columns={'Date': 'date', 'Close': 'price'})

def _price_for_date(df, date):
    """ Helper function for `ticker_at_date`. """
    df = df[df['date'] >= date].sort_values(by='date')
    return df['price'].iloc[0]

@lru_cache(maxsize=128)
def ticker_at_date(ticker, date):
    """
    Price of a ticker at a given date. Raise an IndexError if there is no
    such price.
    """
    try:
        data = load_for_ticker(ticker)
        # Sell at the next opportunity possible
        return _price_for_date(data, date)
    except Exception:
        # If any exception occurs, refresh the cache
        download_for_ticker(ticker, check_cache=False)
        data = load_for_ticker(ticker)
        return _price_for_date(data, date)


# In[ ]:

all_tickers = set(senators_tx['ticker'])
for i, t in enumerate(all_tickers):
    if i % 100 == 0:
        print('Working on ticker {0}'.format(i))
    try:
        download_for_ticker(t)
    except Exception as e:
        print('Ticker {0} failed with exception: {1}'.format(t, e))


# ### Mimicking buy + sell orders
# 
# We calculate a given senator's return by calculating the return between each buy or sell order, and then solving for the cumulative return. We convert that to a CAGR given the time period the senator was investing.
# 
# We keep track of how many units of each stock a senator is holding. If we ever see a filing that indicates the senator sold more than we estimated they are holding, we just sell all of the units we have on record. (We do not allow the senator to go short.)

# In[ ]:

buckets = [
    (1000, 15000),
    (15000, 50000),
    (50000, 100000),
    (100000, 250000),
    (250000, 500000),
    (500000, 1000000),
    (1000000, 5000000),
    (5000000, 25000000),
    (25000000, 50000000),
    (50000000, float('inf'))
]

def same_bucket(dollar_value_a, dollar_value_b):
    """
    If the dollar value of the stock units is roughly the same, sell all
    units.
    """
    for v1, v2 in buckets:
        if dollar_value_a >= v1 and dollar_value_a < v2:
            return dollar_value_b >= v1 and dollar_value_b < v2
    return False

def portfolio_value(stocks, date):
    """
    Value of a portfolio if each ticker has the specified number of units.
    """
    v = 0
    for s, units in stocks.items():
        if units == 0:
            continue
        try:
            v += ticker_at_date(s, date) * units
        except IndexError as e:
            # Swallow missing ticker data exception
            pass
    return v

def calculate_return(before_values,
                     after_values,
                     begin_date,
                     end_date,
                     tx_dates):
    """
    Calculate cumulative return and CAGR given the senators portfolio
    value over time.
    """
    before_values.pop(0)
    after_values.pop(-1)
    # We calculate the total return by calculating the return
    # between each transaction, and solving for the cumulative
    # return.
    growth = np.array(before_values) / np.array(after_values)
    portfolio_return = np.prod(growth[~np.isnan(growth)])
    years = (end_date - begin_date).days / 365
    if years == 0:
        cagr = 0
    else:
        cagr = portfolio_return**(1 / years)
    # DataFrame of cumulative return
    tx_dates.pop(0)
    tx_dates = np.array(tx_dates)
    tx_dates = tx_dates[~np.isnan(growth)]
    cumulative_growth = np.cumprod(growth[~np.isnan(growth)])
    growth_df = pd.DataFrame({
        'date': tx_dates,
        'cumulative_growth': cumulative_growth
    })
    return {
        'portfolio_return': portfolio_return,
        'annual_cagr': cagr,
        'growth': growth_df
    }

def return_for_senator(rows, date_col='tx_date'):
    """
    Simulate a senator's buy and sell orders, and calculate the
    return.
    """
    stocks = defaultdict(int)
    # Value of portfolio at various timepoints to calculate return
    portfolio_value_before_tx = []
    portfolio_value_after_tx = []
    tx_dates = []
    rows = rows.sort_values(by=date_col)
    for _, row in rows.iterrows():
        date = row[date_col]
        if date_col == 'file_date':
            # We can't execute the trade the same day
            date += dt.timedelta(days=1)
        try:
            stock_price = ticker_at_date(row['ticker'], date)
        except IndexError as e:
            # Skip the row if we're missing ticker data
            continue
        value_before_tx = portfolio_value(stocks, date)
        if 'Purchase' in row['order_type']:
            tx_amt = row['tx_estimate']
            n_units = tx_amt / ticker_at_date(row['ticker'], date)
            stocks[row['ticker']] += n_units
        elif 'Sale' in row['order_type']:
            current_value = stock_price * stocks[row['ticker']]
            if 'Full' in row['order_type'] or 
                    same_bucket(row['tx_estimate'], current_value):
                stocks[row['ticker']] = 0
            else:
                new_n_units = stocks[row['ticker']] -
                    row['tx_estimate'] / stock_price
                stocks[row['ticker']] = max(0, new_n_units)
        portfolio_value_before_tx.append(value_before_tx)
        portfolio_value_after_tx.append(portfolio_value(stocks, date))
        tx_dates.append(date)
    return calculate_return(
        portfolio_value_before_tx,
        portfolio_value_after_tx,
        begin_date=min(rows[date_col]),
        end_date=max(rows[date_col]),
        tx_dates=tx_dates
    )


# In[ ]:

senator_returns = []
senator_tx_growth = {}
senator_file_growth = {}
senator_names = set(senators_tx['full_name'])


# The following cell took my laptop about three hours to run.

# In[ ]:

failed_senators = {}
print('{} senators total'.format(len(senator_names)))
for n in senator_names:
    print('Starting {}'.format(n))
    if n in senator_tx_growth:
        # Don't re-calculate for a given senator
        continue
    try:
        tx_return = return_for_senator(
            senators_tx[senators_tx['full_name'] == n],
            date_col='tx_date')
        file_return = return_for_senator(
            senators_tx[senators_tx['full_name'] == n],
            date_col='file_date')
        senator_returns.append({
            'full_name': n,
            'tx_total_return': tx_return['portfolio_return'],
            'tx_cagr': tx_return['annual_cagr'],
            'file_total_return': file_return['portfolio_return'],
            'file_cagr': file_return['annual_cagr']
        })
        senator_tx_growth[n] = tx_return['growth']
        senator_file_growth[n] = file_return['growth']
    except Exception as e:
        print('Failed senator {0} with exception {1}'.format(n, e))
        failed_senators[n] = e


# We look at the results to see the senators that outperformed the market.

# In[ ]:

def plot_senator_growth(growth):
    """ Plot the senator's portfolio growth against the S&P 500. """
    plt.plot_date(growth['date'], growth['cumulative_growth'], '-')
    download_for_ticker('SPY')
    spy = load_for_ticker('SPY')
    spy = spy[(spy['date'] >= min(growth['date']))
              & (spy['date'] <= max(growth['date']))]
    spy_prices = spy['price']
    spy_growth = np.cumprod(np.diff(spy_prices) / spy_prices[1:] + 1)
    dates = spy['date'].iloc[1:]
    plt.plot_date(dates, spy_growth, '-')
    plt.show()
    print('Earliest date: {}'.format(min(growth['date'])))
    print('Latest date: {}'.format(max(growth['date'])))
    print('Market return: {}'.format(
        spy_prices.iloc[-1] / spy_prices.iloc[0]))
    senator_growth = growth['cumulative_growth']
    print('Senator return: {}'.format(
        senator_growth.iloc[-1] / senator_growth.iloc[0]))


# In[ ]:

returns = pd.DataFrame(senator_returns)
returns = returns[(returns['tx_total_return'] > returns['tx_cagr'])
                  & (returns['tx_cagr'] > 0)]
returns.sort_values('tx_cagr')


# In[ ]:

plot_senator_growth(senator_tx_growth['Angus S King, Jr.'])


# ## About this notebook
# 
# Author: Neel Somani, Software Engineer
# 
# Email: neel@berkeley.edu
# 
# Website: https://www.ocf.berkeley.edu/~neel/
# 
# Updated On: 2020-05-10


Get this bounty!!!

#StackBounty: #python #numpy #opencv Dealing with lot of images and multiplications

Bounty: 50

With some basic knowledge of Python and referring a lot of sources, I have written the code below. But it takes half an hour for execution. How can I reduce the time? I read about vectorization but not understanding how exactly I can use it here.

In this, I have to read 2D skeleton images( Size 1980×1080) and Depth images (size 512×424). As I have to do mapping of both the images of different sizes and form a 3D skelton, depth_to_xyz_and_rgb(uu , vv,dep) function does that.
Here the skeleton Image is generated from the OpenPose Library of Facebook. The skeleton image mainly has 25 key-points and those 25 key-points are joined together to form a 2D skeleton.
Main task is to calculate Gait Parameters which could be found out if we know the 3D co-ordinate of each joint.
Initially I was generating 3D Point cloud to view if the code generates it properly or not and as the code was generating it properly and I did’t need the point cloud file for further processing, I have removed that part of code.

Now here I am mainly saving the 2D skeleton location of 25 key-points of skeleton and 3D values of 25 key-points. I have added a sample of generated excel file as an a sample in the link below.

import math
import sys
from PIL import Image
import numpy as np
import scipy.io as sio
import os



scalingFactor = 5000.0
fx_d=365.3768
fy_d=365.3768
cx_d=253.6238
cy_d=211.5918
fx_rgb=1054.8082
fy_rgb=1054.8082
cx_rgb=965.6725
cy_rgb=552.0879



RR = np.array([
    [0.99991, -0.013167,-0.0020807],
    [0.013164,0.99991,-0.0011972],
    [-0.0020963,0.0011697,1]
])
TT = np.array([ 0.052428,0.0006748,0.000098668 ])
extrinsics=np.array([[.99991,-0.013167,-0.0020807,0.052428],[0.013164,0.99991,-0.0011972,0.0006748],[-0.0020963,0.0011697,1,0.000098668],[0,0,0,1]])


path = 'G:\SENDA\Proband_172\GAIT\RG1_color\mat\'
file_lists = os.listdir(path)
path3 = 'G:\SENDA\Proband_123\GAIT\RG1_Depth\selected\'
included_extensions = ['bmp']
file_lists3 = [fn for fn in os.listdir(path3)
              if any(fn.endswith(ext) for ext in included_extensions)]

path4 = 'G:\SENDA\Proband_123\GAIT\RG1_results\skeleton\'
file_lists4 = os.listdir(path4) 


path6 = 'G:\SENDA\Proband_123\GAIT\RG1_results\'
file_lists6 = os.listdir(path6)


def init_maxvalue():
    for center in centers3:
        if center==(0,0):
            continue
        min_xy.append(sys.float_info.max)
        min_vex.append((0,0))
        p_vex.append((0,0,0))

def depth_rgb_registration(rgb,depth):

#    f=open("RecordAll.xls",'w') # Taking second argument i.e the depth image name
    init_maxvalue()
    rgb = Image.open(rgb)
    depth = Image.open(depth).convert('L') # convert image to monochrome 
    if rgb.mode != "RGB":
            raise Exception("Color image is not in RGB format")


    for v in range(depth.size[0]):
        for u in range(depth.size[1]):
            try:
                (p,x,y)=depth_to_xyz_and_rgb(v,u,depth) # this gives p = [pcx, pcy,pcz] 
                #aligned(:,:,0) = p
            except:
                continue

            if (x > rgb.size[0]-1 or y > rgb.size[1]-1 or x < 1 or y < 1 or np.isnan(x) or np.isnan(y)):
                continue
            x = round(x)
            y = round(y)
            color=rgb.getpixel((x,y))
            #print(color)
            min_distance((x,y),p)

            if color==(0,0,0):
                p[0]=0
                p[1]=0
                p[2]=0
                continue
            #if.write("%f .%f %f n"%(p[0],p[1],p[2]))
            points.append(" %f %f %f %d %d %d 0n"%(p[0],p[1],p[2],255,0,0))
        i=0
        x=[]
        y=[]
        z=[]

    for val in min_vex:
        f.write(str(val)+' '+str(p_vex[i])+'');

        points.append(" %f %f %f %d %d %d 0n"%(p_vex[i][0],p_vex[i][1],p_vex[i][2],0,255,0))
        x.append(p_vex[i][0])
        y.append(p_vex[i][1])
        z.append(p_vex[i][2])
        i=i+1
    else:
        f.write("n")
#    f.close()

def min_distance(val,p):
    i=0
    for center in centers3:
        if center==(0,0):
            continue
        temp=math.sqrt(math.pow(center[0]-val[0],2)+math.pow(center[1]-val[1],2))
        if temp<min_xy[i]:
            min_xy[i]=temp
            min_vex[i]=val
            p_vex[i]=p
        i=i+1

def depth_to_xyz_and_rgb(uu , vv,dep):

    # get z value in meters
    pcz =dep.getpixel((uu, vv))
    if pcz==60:
        return

    pcx = (uu - cx_d) * pcz / fx_d
    pcy = (vv - cy_d) * pcz / fy_d

    # apply extrinsic calibration
    P3D = np.array( [pcx , pcy , pcz] )
    P3Dp = np.dot(RR , P3D) - TT

    # rgb indexes that P3D should match
    uup = P3Dp[0] * fx_rgb / P3Dp[2] + cx_rgb
    vvp = P3Dp[1] * fy_rgb / P3Dp[2] + cy_rgb

    # return a point in point cloud and its corresponding color indices
    return P3D , uup , vvp

if __name__ == '__main__':


    f=open("Proband_123_RG1.xls",'w')    
    for idx, list1 in enumerate(file_lists4):
    # Iterate through items of list2
        for i in range(len(file_lists3)):

            if list1.split('.')[0] == file_lists3[i].split('.')[0]:


                rgb = os.path.join(path4, list1)
                depth = os.path.join(path3,sorted(file_lists3)[i])
                m = sorted(file_lists)[idx]    
                mat2 = sio.loadmat(os.path.join(path,m))
                abc= list1.split('.')[0]
                centers2 = mat2['b2']
                centers3 =np.array(centers2).tolist()
                min_xy=[]
                min_vex=[]
                p_vex=[]
                image_list = []
                image_list2 = []
                points = []
                depth_rgb_registration(rgb,depth)
    f.close

The below part of the code takes maximum time for execution because it does Rotation and Translation for almost every pixel taking first depth image into consideration. In my images, RGB images has lot of zeros, so if this process can be reversed then it could reduce the time i think. Just got an idea and I am working on it.
Link for algorithm

def depth_to_xyz_and_rgb(uu , vv,dep):

    # get z value in meters
    pcz =dep.getpixel((uu, vv))
    if pcz==60:
        return

    pcx = (uu - cx_d) * pcz / fx_d
    pcy = (vv - cy_d) * pcz / fy_d

    # apply extrinsic calibration
    P3D = np.array( [pcx , pcy , pcz] )
    P3Dp = np.dot(RR , P3D) - TT

    # rgb indexes that P3D should match
    uup = P3Dp[0] * fx_rgb / P3Dp[2] + cx_rgb
    vvp = P3Dp[1] * fy_rgb / P3Dp[2] + cy_rgb

    # return a point in point cloud and its corresponding color indices
    return P3D , uup , vvp


Get this bounty!!!

#StackBounty: #python #numpy Dealing with lot of images and multiplications

Bounty: 50

With some basic knowledge of Python and referring a lot of sources, I have written the code below. But it takes half an hour for execution. How can I reduce the time? I read about vectorization but not understanding how exactly I can use it here.

In this, I have to read 2D skeleton images( Size 1980×1080) and Depth images (size 512×424). As I have to do mapping of both the images of different sizes and form a 3D skelton, depth_to_xyz_and_rgb(uu , vv,dep) function does that.
Here the skeleton Image is generated from the OpenPose Library of Facebook. The skeleton image mainly has 25 key-points and those 25 key-points are joined together to form a 2D skeleton.
Main task is to calculate Gait Parameters which could be found out if we know the 3D co-ordinate of each joint.
Initially I was generating 3D Point cloud to view if the code generates it properly or not and as the code was generating it properly and I did’t need the point cloud file for further processing, I have removed that part of code.

Now here I am mainly saving the 2D skeleton location of 25 key-points of skeleton and 3D values of 25 key-points. I have added a sample of generated excel file as an a sample in the link below.

import math
import sys
from PIL import Image
import numpy as np
import scipy.io as sio
import os



scalingFactor = 5000.0
fx_d=365.3768
fy_d=365.3768
cx_d=253.6238
cy_d=211.5918
fx_rgb=1054.8082
fy_rgb=1054.8082
cx_rgb=965.6725
cy_rgb=552.0879



RR = np.array([
    [0.99991, -0.013167,-0.0020807],
    [0.013164,0.99991,-0.0011972],
    [-0.0020963,0.0011697,1]
])
TT = np.array([ 0.052428,0.0006748,0.000098668 ])
extrinsics=np.array([[.99991,-0.013167,-0.0020807,0.052428],[0.013164,0.99991,-0.0011972,0.0006748],[-0.0020963,0.0011697,1,0.000098668],[0,0,0,1]])


path3 = 'G:\SENDA\Proband_123\GAIT\RG1_Depth\selected\'
included_extensions = ['bmp']
file_lists3 = [fn for fn in os.listdir(path3)
              if any(fn.endswith(ext) for ext in included_extensions)]

path4 = 'G:\SENDA\Proband_123\GAIT\RG1_results\skeleton\'
file_lists4 = os.listdir(path4) 


path6 = 'G:\SENDA\Proband_123\GAIT\RG1_results\'
file_lists6 = os.listdir(path6)


def init_maxvalue():
    for center in centers3:
        if center==(0,0):
            continue
        min_xy.append(sys.float_info.max)
        min_vex.append((0,0))
        p_vex.append((0,0,0))

def depth_rgb_registration(rgb,depth):

#    f=open("RecordAll.xls",'w') # Taking second argument i.e the depth image name
    init_maxvalue()
    rgb = Image.open(rgb)
    depth = Image.open(depth).convert('L') # convert image to monochrome 
    if rgb.mode != "RGB":
            raise Exception("Color image is not in RGB format")


    for v in range(depth.size[0]):
        for u in range(depth.size[1]):
            try:
                (p,x,y)=depth_to_xyz_and_rgb(v,u,depth) # this gives p = [pcx, pcy,pcz] 
                #aligned(:,:,0) = p
            except:
                continue

            if (x > rgb.size[0]-1 or y > rgb.size[1]-1 or x < 1 or y < 1 or np.isnan(x) or np.isnan(y)):
                continue
            x = round(x)
            y = round(y)
            color=rgb.getpixel((x,y))
            #print(color)
            min_distance((x,y),p)

            if color==(0,0,0):
                p[0]=0
                p[1]=0
                p[2]=0
                continue
            #if.write("%f .%f %f n"%(p[0],p[1],p[2]))
            points.append(" %f %f %f %d %d %d 0n"%(p[0],p[1],p[2],255,0,0))
        i=0
        x=[]
        y=[]
        z=[]

    for val in min_vex:
        f.write(str(val)+' '+str(p_vex[i])+'');

        points.append(" %f %f %f %d %d %d 0n"%(p_vex[i][0],p_vex[i][1],p_vex[i][2],0,255,0))
        x.append(p_vex[i][0])
        y.append(p_vex[i][1])
        z.append(p_vex[i][2])
        i=i+1
    else:
        f.write("n")
#    f.close()

def min_distance(val,p):
    i=0
    for center in centers3:
        if center==(0,0):
            continue
        temp=math.sqrt(math.pow(center[0]-val[0],2)+math.pow(center[1]-val[1],2))
        if temp<min_xy[i]:
            min_xy[i]=temp
            min_vex[i]=val
            p_vex[i]=p
        i=i+1

def depth_to_xyz_and_rgb(uu , vv,dep):

    # get z value in meters
    pcz =dep.getpixel((uu, vv))
    if pcz==60:
        return

    pcx = (uu - cx_d) * pcz / fx_d
    pcy = (vv - cy_d) * pcz / fy_d

    # apply extrinsic calibration
    P3D = np.array( [pcx , pcy , pcz] )
    P3Dp = np.dot(RR , P3D) - TT

    # rgb indexes that P3D should match
    uup = P3Dp[0] * fx_rgb / P3Dp[2] + cx_rgb
    vvp = P3Dp[1] * fy_rgb / P3Dp[2] + cy_rgb

    # return a point in point cloud and its corresponding color indices
    return P3D , uup , vvp

if __name__ == '__main__':


    f=open("Proband_123_RG1.xls",'w')    
    for idx, list1 in enumerate(file_lists4):
    # Iterate through items of list2
        for i in range(len(file_lists3)):

            if list1.split('.')[0] == file_lists3[i].split('.')[0]:


                rgb = os.path.join(path4, list1)
                depth = os.path.join(path3,sorted(file_lists3)[i])
                m = sorted(file_lists)[idx]    
                mat2 = sio.loadmat(os.path.join(path,m))
                abc= list1.split('.')[0]
                centers2 = mat2['b2']
                centers3 =np.array(centers2).tolist()
                min_xy=[]
                min_vex=[]
                p_vex=[]
                image_list = []
                image_list2 = []
                points = []
                depth_rgb_registration(rgb,depth)
    f.close


Get this bounty!!!

#StackBounty: #python #numpy How can I simplify the Python code dealing with lot of images and multiplications?

Bounty: 50

With some basic knowledge of Python and referring a lot of sources, I have written the code below. But it takes half an hour for execution. How can I reduce the time? I read about vectorization but not understanding how exactly I can use it here.

In this, I have to read 2D skeleton images( Size 1980×1080) and Depth images (size 512×424). As I have to do mapping of both the images of different sizes and form a 3D skelton, depth_to_xyz_and_rgb(uu , vv,dep) function does that.
Here the skeleton Image is generated from the OpenPose Library of Facebook. The skeleton image mainly has 25 key-points and those 25 key-points are joined together to form a 2D skeleton.
Main task is to calculate Gait Parameters which could be found out if we know the 3D co-ordinate of each joint.
Initially I was generating 3D Point cloud to view if the code generates it properly or not and as the code was generating it properly and I did’t need the point cloud file for further processing, I have removed that part of code.

Now here I am mainly saving the 2D skeleton location of 25 key-points of skeleton and 3D values of 25 key-points. I have added a sample of generated excel file as an a sample in the link below.

import math
import sys
from PIL import Image
import numpy as np
import scipy.io as sio
import os



scalingFactor = 5000.0
fx_d=365.3768
fy_d=365.3768
cx_d=253.6238
cy_d=211.5918
fx_rgb=1054.8082
fy_rgb=1054.8082
cx_rgb=965.6725
cy_rgb=552.0879



RR = np.array([
    [0.99991, -0.013167,-0.0020807],
    [0.013164,0.99991,-0.0011972],
    [-0.0020963,0.0011697,1]
])
TT = np.array([ 0.052428,0.0006748,0.000098668 ])
extrinsics=np.array([[.99991,-0.013167,-0.0020807,0.052428],[0.013164,0.99991,-0.0011972,0.0006748],[-0.0020963,0.0011697,1,0.000098668],[0,0,0,1]])


path3 = 'G:\SENDA\Proband_123\GAIT\RG1_Depth\selected\'
included_extensions = ['bmp']
file_lists3 = [fn for fn in os.listdir(path3)
              if any(fn.endswith(ext) for ext in included_extensions)]

path4 = 'G:\SENDA\Proband_123\GAIT\RG1_results\skeleton\'
file_lists4 = os.listdir(path4) 


path6 = 'G:\SENDA\Proband_123\GAIT\RG1_results\'
file_lists6 = os.listdir(path6)


def init_maxvalue():
    for center in centers3:
        if center==(0,0):
            continue
        min_xy.append(sys.float_info.max)
        min_vex.append((0,0))
        p_vex.append((0,0,0))

def depth_rgb_registration(rgb,depth):

#    f=open("RecordAll.xls",'w') # Taking second argument i.e the depth image name
    init_maxvalue()
    rgb = Image.open(rgb)
    depth = Image.open(depth).convert('L') # convert image to monochrome 
    if rgb.mode != "RGB":
            raise Exception("Color image is not in RGB format")


    for v in range(depth.size[0]):
        for u in range(depth.size[1]):
            try:
                (p,x,y)=depth_to_xyz_and_rgb(v,u,depth) # this gives p = [pcx, pcy,pcz] 
                #aligned(:,:,0) = p
            except:
                continue

            if (x > rgb.size[0]-1 or y > rgb.size[1]-1 or x < 1 or y < 1 or np.isnan(x) or np.isnan(y)):
                continue
            x = round(x)
            y = round(y)
            color=rgb.getpixel((x,y))
            #print(color)
            min_distance((x,y),p)

            if color==(0,0,0):
                p[0]=0
                p[1]=0
                p[2]=0
                continue
            #if.write("%f .%f %f n"%(p[0],p[1],p[2]))
            points.append(" %f %f %f %d %d %d 0n"%(p[0],p[1],p[2],255,0,0))
        i=0
        x=[]
        y=[]
        z=[]

    for val in min_vex:
        f.write(str(val)+' '+str(p_vex[i])+'');

        points.append(" %f %f %f %d %d %d 0n"%(p_vex[i][0],p_vex[i][1],p_vex[i][2],0,255,0))
        x.append(p_vex[i][0])
        y.append(p_vex[i][1])
        z.append(p_vex[i][2])
        i=i+1
    else:
        f.write("n")
#    f.close()

def min_distance(val,p):
    i=0
    for center in centers3:
        if center==(0,0):
            continue
        temp=math.sqrt(math.pow(center[0]-val[0],2)+math.pow(center[1]-val[1],2))
        if temp<min_xy[i]:
            min_xy[i]=temp
            min_vex[i]=val
            p_vex[i]=p
        i=i+1

def depth_to_xyz_and_rgb(uu , vv,dep):

    # get z value in meters
    pcz =dep.getpixel((uu, vv))
    if pcz==60:
        return

    pcx = (uu - cx_d) * pcz / fx_d
    pcy = (vv - cy_d) * pcz / fy_d

    # apply extrinsic calibration
    P3D = np.array( [pcx , pcy , pcz] )
    P3Dp = np.dot(RR , P3D) - TT

    # rgb indexes that P3D should match
    uup = P3Dp[0] * fx_rgb / P3Dp[2] + cx_rgb
    vvp = P3Dp[1] * fy_rgb / P3Dp[2] + cy_rgb

    # return a point in point cloud and its corresponding color indices
    return P3D , uup , vvp

if __name__ == '__main__':


    f=open("Proband_123_RG1.xls",'w')    
    for idx, list1 in enumerate(file_lists4):
    # Iterate through items of list2
        for i in range(len(file_lists3)):

            if list1.split('.')[0] == file_lists3[i].split('.')[0]:


                rgb = os.path.join(path4, list1)
                depth = os.path.join(path3,sorted(file_lists3)[i])
                m = sorted(file_lists)[idx]    
                mat2 = sio.loadmat(os.path.join(path,m))
                abc= list1.split('.')[0]
                centers2 = mat2['b2']
                centers3 =np.array(centers2).tolist()
                min_xy=[]
                min_vex=[]
                p_vex=[]
                image_list = []
                image_list2 = []
                points = []
                depth_rgb_registration(rgb,depth)
    f.close


Get this bounty!!!