#StackBounty: #python #beginner #pandas Customer segmentation using RFM analysis

Bounty: 50

Currently, my code works perfectly well but i would like to make it cleaner for other users by removing the duplicate and similar lines of code into functions or for loops. Because I am still new learning Python, I still did not get a hang of functions and for loops. My data frame rfm includes 5 columns:

  • Max Date (latest transaction)
  • Id (unique identifier)
  • Recency (today’s date minus Latest Transaction Date)
  • Frequency (total # of transactions per Id since its subscription)
  • Monetary (total amount of $ spent by Id since its subscription)

Separating the main data frame into 3 different df because the sort differs for each cumulative sum column. Frequency and Monetary dfs have identical calculations:

rfm_recency = rfm[['Max_Date', 'Id', 'Member_id', 'Recency']].copy()
rfm_recency = rfm_recency.sort_values(['Recency'], ascending=True)

rfm_frequency = rfm[['Id', 'Member_id', 'Frequency']].copy()
rfm_frequency = rfm_frequency.sort_values(['Frequency'], ascending=False)
rfm_frequency['cum_sum'] = rfm_frequency['Frequency'].cumsum()
rfm_frequency['cum_sum_perc'] = rfm_frequency['cum_sum'] / rfm_frequency['Frequency'].sum()

rfm_monetary = rfm[['Id', 'Member_id', 'Monetary']].copy()
rfm_monetary = rfm_monetary.sort_values(['Monetary'], ascending=False)
rfm_monetary['cum_sum'] = rfm_monetary['Monetary'].cumsum()
rfm_monetary['cum_sum_perc'] = rfm_monetary['cum_sum'] / rfm_monetary['Monetary'].sum()

def scorefm(x):
    """Function for separating data into 5 bins for Frequency & Monetary df """
    if x <= 0.20:
        return 5
    elif x <= 0.40:
        return 4
    elif x <= 0.60:
        return 3
    elif x <= 0.80:
        return 2
    else:
        return 1


# Divide the Recency df into equal quantiles
rfm_recency['r_score'] = 5 - pd.qcut(rfm_recency['Recency'], q=5, labels=False)

# Create scores from cum_sum_perc for Frequency and Monetary
rfm_frequency['f_score'] = rfm_frequency['cum_sum_perc'].apply(scorefm)
rfm_monetary['m_score'] = rfm_monetary['cum_sum_perc'].apply(scorefm)

# Resorting data frames by ID to merge
rfm_recency = rfm_recency.sort_values('Id')
rfm_frequency = rfm_frequency.sort_values('Id')
rfm_monetary = rfm_monetary.sort_values('Id')

# Merging data frames together
result = rfm_recency.copy(['Recency', 'r_score'])
result = result.join(rfm_frequency[['Frequency', 'f_score']])
result = result.join(rfm_monetary[['Monetary', 'm_score']])

# Create an FM and RFM score based on the individual R, F, M scores.
result['FM'] = (result['f_score'] + result['m_score']) / 2
result['RFM_Score'] = result['r_score'] * 10 + result['FM']


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.