r - ddply colSums and count of column together -


i'm new r , have pieced syntax below based on other helpful suggestions here on stackoverflow. i'm trying sum column called "combined hours" , count of column called doc_line_num grouped column doc_num.

so each doc_num, calculate sum combined_hours, , show count of doc_line_num.

the syntax below works fine column sum of combined_hours how incorporate count logic syntax doc_line_num?

thank you.

train2 <- ddply(train, c("weeknum", "doc_num", "doc_line_num", "short_date","cust_code", "op_code", "job_tp_code"), function(x) colsums(x[c("combined_hours")]))  # sample data  weeknum doc_num doc_line_num short_date cust_code  op_code job_tp_code combined_hours 40  227555            1 2015-10-02    dotsug ndona          pu      0.0269448 40  227555            3 2015-10-02    dotsug ndona          pu      0.4183320 

using old-school plyr, should able

ddply(train, .variables = "doc_num", summarize,       n_doc_line_num = length(unique(doc_line_num)),       sum_comb_hours = sum(combined_hours)) 

the ddply function has been replaced new dplyr package. using dplyr, written:

library(dplyr) train %>% group_by(doc_num) %>%     summarize(n_doc_line_num = n_distinct(doc_line_num),               sum_comb_hours = sum(combined_hours)) 

i assumed "a count of column called doc_line_num" mean count of distinct values.

if share larger bit of sample data (preferably dput, dput(droplevels(head(train, 10))) i'd happy test make sure things good.

both in dplyr , in plyr::ddply, summarize drop column aren't grouping variables. if want rest of columns retained (and have same value each value of doc_num) can add them grouping retain them. (by "the grouping" mean dplyr::group_by or .variables argument of plyr::ddply.)


Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -