#StackBounty: #python #pandas #dataframe #numpy If value of a column is less then 3 then replace value in another column with value fro…

Bounty: 50

I have a large dataframe (100.000 rows) with many columns. These are the relevant columns for my question:

id   herd        birth     H_BY  HYcount      death       H_DY   HYcount2
1    1345   2005-01-09    134505       1  2010-01-09    134510       1
2    1345   2010-03-05    134510       2  2015-01-09    134515       2
3    1345   2010-05-10    134510       2  2015-01-09    134515       2
4    1345   2011-06-01    134511       1  2016-01-09    134516       1
5    1345   2012-09-01    134512       1  2017-01-09    134517       2
6    1345   2015-09-13    134515       4  2017-01-09    134517       2
7    1346   2015-10-01    134615       4  2019-01-09    134619       1
8    1346   2015-10-27    134615       4  2020-01-09    134620       2
9    1346   2015-11-10    134615       4  2020-01-09    134620       2
10   1346   2016-12-10    134616       1  2021-01-09    134621       1

I am creating Herd-year fixed effects.
I have already combined the herd and birth/death columns into herd+birth year and herd+death year into separate columns and counted how many time each fixed effect appears in the dataframe. As can be seen above.

However, now I want to check my whole dataframe for HYcount and HYcount2 that are less then 3. So I don’t want any that are either 1 or 2 in a HY group.

I would like to run though the dataframe and combine those HY groups that are 1 or 2 per group into other groups. Below or above.

EDIT

I also want to only combine HY groups WITHIN EACH HERD!

So I don’t want to add a herd member of one herd to another herd with the Herd-year variable.

Here is what I’ve tried with the birth year fixed effect.

#Sort the df by the relevant value
df= df.sort_values(by=['H_BY'])


df.loc[
    (df['HYcount'] < 3),
    'H_BY'] = df['H_BY'].shift(-1)

#Count the values again 
df['HC1_c'] = df.groupby('H_BY')['H_BY'].transform('count')

But this is a very feeble attempt. I have to run through this many many times to rid my dataframe of all values that are less then 3 and it does not work with record number 1. And I want to repeat this process over at least 4 other columns.

EDIT

And of course this code does not do anything about combining within one herd.

Any tips and trick or ideas how I can do this more efficiently?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.