python - Counting dates in a range set by pandas dataframe -


i have pandas dataframe contains 2 date columns, start date , end date defines range. i'd able collect total count dates across rows in dataframe, defined these columns.

for example, table looks like:

index        start_date         end date      0         '2015-01-01'    '2015-01-17'      1         '2015-01-03'    '2015-01-12' 

and result per date aggregate, like:

    date           count '2015-01-01'     1 '2015-01-02'     1 '2015-01-03'     2 

and on.

my current approach works extremely slow on big dataframe i'm looping across rows, calculating range , looping through this. i'm hoping find better approach.

currently i'm doing :

date = pd.date_range (min (df.start_date), max (df.end_date)) df2 = pd.dataframe (index =date) df2 ['count'] = 0  index, row in df.iterrows ():     dates = pd.date_range (row ['start_date'], row ['end_date'])     date in dates:         df2.loc['date']['count'] += 1 

after stacking relevant columns suggested @sam, use value_counts.

df[['start_date', 'end date']].stack().value_counts() 

edit:

given want count dates between start , end dates:

start_dates = pd.to_datetime(df.start_date) end_dates = pd.to_datetime(df.end_date)  >>> pd.series(dt.date() group in                [pd.date_range(start, end) start, end in zip(start_dates, end_dates)]                 dt in group).value_counts() out[178]:  2015-01-07    2 2015-01-06    2 2015-01-12    2 2015-01-05    2 2015-01-04    2 2015-01-10    2 2015-01-03    2 2015-01-09    2 2015-01-08    2 2015-01-11    2 2015-01-16    1 2015-01-17    1 2015-01-14    1 2015-01-15    1 2015-01-02    1 2015-01-01    1 2015-01-13    1 dtype: int64 

Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -