python - Counting dates in a range set by pandas dataframe -
i have pandas dataframe contains 2 date columns, start date , end date defines range. i'd able collect total count dates across rows in dataframe, defined these columns.
for example, table looks like:
index start_date end date 0 '2015-01-01' '2015-01-17' 1 '2015-01-03' '2015-01-12'
and result per date aggregate, like:
date count '2015-01-01' 1 '2015-01-02' 1 '2015-01-03' 2
and on.
my current approach works extremely slow on big dataframe i'm looping across rows, calculating range , looping through this. i'm hoping find better approach.
currently i'm doing :
date = pd.date_range (min (df.start_date), max (df.end_date)) df2 = pd.dataframe (index =date) df2 ['count'] = 0 index, row in df.iterrows (): dates = pd.date_range (row ['start_date'], row ['end_date']) date in dates: df2.loc['date']['count'] += 1
after stacking relevant columns suggested @sam, use value_counts
.
df[['start_date', 'end date']].stack().value_counts()
edit:
given want count dates between start , end dates:
start_dates = pd.to_datetime(df.start_date) end_dates = pd.to_datetime(df.end_date) >>> pd.series(dt.date() group in [pd.date_range(start, end) start, end in zip(start_dates, end_dates)] dt in group).value_counts() out[178]: 2015-01-07 2 2015-01-06 2 2015-01-12 2 2015-01-05 2 2015-01-04 2 2015-01-10 2 2015-01-03 2 2015-01-09 2 2015-01-08 2 2015-01-11 2 2015-01-16 1 2015-01-17 1 2015-01-14 1 2015-01-15 1 2015-01-02 1 2015-01-01 1 2015-01-13 1 dtype: int64
Comments
Post a Comment