r - ddply colSums and count of column together -
i'm new r , have pieced syntax below based on other helpful suggestions here on stackoverflow. i'm trying sum column called "combined hours" , count of column called doc_line_num grouped column doc_num.
so each doc_num, calculate sum combined_hours, , show count of doc_line_num.
the syntax below works fine column sum of combined_hours how incorporate count logic syntax doc_line_num?
thank you.
train2 <- ddply(train, c("weeknum", "doc_num", "doc_line_num", "short_date","cust_code", "op_code", "job_tp_code"), function(x) colsums(x[c("combined_hours")])) # sample data weeknum doc_num doc_line_num short_date cust_code op_code job_tp_code combined_hours 40 227555 1 2015-10-02 dotsug ndona pu 0.0269448 40 227555 3 2015-10-02 dotsug ndona pu 0.4183320
using old-school plyr, should able
ddply(train, .variables = "doc_num", summarize, n_doc_line_num = length(unique(doc_line_num)), sum_comb_hours = sum(combined_hours)) the ddply function has been replaced new dplyr package. using dplyr, written:
library(dplyr) train %>% group_by(doc_num) %>% summarize(n_doc_line_num = n_distinct(doc_line_num), sum_comb_hours = sum(combined_hours)) i assumed "a count of column called doc_line_num" mean count of distinct values.
if share larger bit of sample data (preferably dput, dput(droplevels(head(train, 10))) i'd happy test make sure things good.
both in dplyr , in plyr::ddply, summarize drop column aren't grouping variables. if want rest of columns retained (and have same value each value of doc_num) can add them grouping retain them. (by "the grouping" mean dplyr::group_by or .variables argument of plyr::ddply.)
Comments
Post a Comment