data manipulation - How do I remove all participants in R that only occur with one single variable level and never with the second one? -


i analyzing online community dataset r. i'd appreciate since stuck @ 1 problem. here outline:

dataset: username of every user available. every row represents activity of 1 user in 1 single online community. example a: row 1 shows user 'blue' 'member' of online community x has contributed 1 post far. example b: row 5 shows user 'blue' 'owner' of online community y has contributed 2 posts far. see below!

question: want remove users dataset active either member or user - in online community. want remove them if active members in several online communities. in other words, want remove users 'orange', 'purple', 'black'and 'white'. important: dataset contains > 1mio. rows. looking approach takes account :) thank you.

username role   # of posts  blue    member  1 blue    member  0 red     owner   6 red     owner   1 blue    owner   2 red     member  1 blue    owner   3 blue    member  2 blue    owner   1 blue    owner   0 red     member  8 green   owner   1 red     owner   2 red     member  3 green   member  4 yellow  owner   5 green   member  3 green   owner   4 yellow  owner   5 yellow  member  6 yellow  owner   8  *orange owner   1 orange  owner   2 purple  member  3 purple  member  4 black   owner   4 white   member  4* 

i assuming assertion in comment correct.

using data.table package, because i'm in fanboy mood. note converting data table break data frame syntax have afterwards, if you're trying plug other code, you'll want use setdf(users2) afterwards convert back.

library('data.table') setdt(users) users_to_remove <- users[, .n, .(username, role)][, .n, username][n == 1, username] users2 <- users[!(username %in% users_to_remove)] print(setdiff(users$username, users2$username)) 

the 3rd line might little hard follow, because it's chaining 3 operations.

  1. count number of observations each combination of username/role.
  2. discard number of observations each combination has , count number of roles each username has.
  3. restrict usernames have 1 role, , return vector of usernames

Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -