data manipulation - How do I remove all participants in R that only occur with one single variable level and never with the second one? -
i analyzing online community dataset r. i'd appreciate since stuck @ 1 problem. here outline:
dataset: username of every user available. every row represents activity of 1 user in 1 single online community. example a: row 1 shows user 'blue' 'member' of online community x has contributed 1 post far. example b: row 5 shows user 'blue' 'owner' of online community y has contributed 2 posts far. see below!
question: want remove users dataset active either member or user - in online community. want remove them if active members in several online communities. in other words, want remove users 'orange', 'purple', 'black'and 'white'. important: dataset contains > 1mio. rows. looking approach takes account :) thank you.
username role # of posts blue member 1 blue member 0 red owner 6 red owner 1 blue owner 2 red member 1 blue owner 3 blue member 2 blue owner 1 blue owner 0 red member 8 green owner 1 red owner 2 red member 3 green member 4 yellow owner 5 green member 3 green owner 4 yellow owner 5 yellow member 6 yellow owner 8 *orange owner 1 orange owner 2 purple member 3 purple member 4 black owner 4 white member 4*
i assuming assertion in comment correct.
using data.table
package, because i'm in fanboy mood. note converting data table break data frame syntax have afterwards, if you're trying plug other code, you'll want use setdf(users2)
afterwards convert back.
library('data.table') setdt(users) users_to_remove <- users[, .n, .(username, role)][, .n, username][n == 1, username] users2 <- users[!(username %in% users_to_remove)] print(setdiff(users$username, users2$username))
the 3rd line might little hard follow, because it's chaining 3 operations.
- count number of observations each combination of username/role.
- discard number of observations each combination has , count number of roles each username has.
- restrict usernames have 1 role, , return vector of usernames
Comments
Post a Comment