r - ddply colSums and count of column together -
i'm new r , have pieced syntax below based on other helpful suggestions here on stackoverflow. i'm trying sum column called "combined hours" , count of column called doc_line_num grouped column doc_num.
so each doc_num, calculate sum combined_hours, , show count of doc_line_num.
the syntax below works fine column sum of combined_hours how incorporate count logic syntax doc_line_num?
thank you.
train2 <- ddply(train, c("weeknum", "doc_num", "doc_line_num", "short_date","cust_code", "op_code", "job_tp_code"), function(x) colsums(x[c("combined_hours")])) # sample data weeknum doc_num doc_line_num short_date cust_code op_code job_tp_code combined_hours 40 227555 1 2015-10-02 dotsug ndona pu 0.0269448 40 227555 3 2015-10-02 dotsug ndona pu 0.4183320
using old-school plyr
, should able
ddply(train, .variables = "doc_num", summarize, n_doc_line_num = length(unique(doc_line_num)), sum_comb_hours = sum(combined_hours))
the ddply
function has been replaced new dplyr
package. using dplyr
, written:
library(dplyr) train %>% group_by(doc_num) %>% summarize(n_doc_line_num = n_distinct(doc_line_num), sum_comb_hours = sum(combined_hours))
i assumed "a count of column called doc_line_num" mean count of distinct values.
if share larger bit of sample data (preferably dput
, dput(droplevels(head(train, 10)))
i'd happy test make sure things good.
both in dplyr
, in plyr::ddply
, summarize
drop column aren't grouping variables. if want rest of columns retained (and have same value each value of doc_num) can add them grouping retain them. (by "the grouping" mean dplyr::group_by
or .variables
argument of plyr::ddply
.)
Comments
Post a Comment