#StackBounty: #python #performance Find the 4 best parameters of a function to have the best cumulative sum

Bounty: 50

I have a dataframe data with severals columns and for each columns there are 10 years of daily data in average, somehting around 4000 rows then. I have this function :

def fct(data, name_a, name_b, param_a, param_b, param_c, param_d):
    df = data.loc[(data[name_a].notnull()) & (data[name_b].notnull()),[name_a, name_b]]
    df_corr = pd.DataFrame()
    df_corr[name_a] = df[name_a]
    df_corr[name_b] = df[name_b].shift(param_a)
    df['pct'] = df[name_a].pct_change() 
    df['name_b_delta'] = df[name_b].shift(2+param_a)-df[name_b].shift(2+param_a+param_b)
    df['is_ok'] = df['name_b_delta'].apply(lambda x : np.sign(x+(-param_c)))
    if param_d:
        df['results'] = df['is_ok'] * (-1) * df['pct'] 
        df['results'] = df['is_ok'] * df['pct'] 
    df['cumul_results'] = df['results'].cumsum()             
    return df

and currently, to optimize the cumul_results for differents pairs of name_a and name_b, i search for the best params using those nested loops :

name_a_list = ['a_dozen_of_elements_here']
name_b_list= ['25_elements_here']

df_line= []
df_index = []

for name_a in name_a_list :
    for name_b in name_b_list :
        for param_a in range(0,15):
            for param_b in range(1,15):
                for param_c in range (-2, 3, 1):
                    for param_d in [True, False]:
                        df = fct(data, name_a, name_b, param_a, param_b, param_c/1000, param_d):
                        df_line.append([df['cumul_results'].iloc[-1], param_a, param_b, param_c, param_d])

results = pd.DataFrame(index=df_index, data=df_line, columns=['final result', param_a, param_b, param_c, param_d])

It’s more than 840 000 calls of the function fct and takes around 4 hours. I was wondering then if i could increase perfomance of this code to reduce time. Thanks !

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.