Dear all,
I'm puzzled by the following example inspired by a recent question on
R-help,
cc <- textConnection("user_id website time
20 google 0930
21 yahoo 0935
20 facebook 1000
25 facebook 1015
61 google 0940")
d <- read.table(cc, head=T) ; close(cc)
table(d$user_id) # count the occurrences
# now I'd like to include these results in the original data.frame,
ddply(d, .(website), transform, count = table(user_id)) # why two new
columns?
I just can't understand how this is different from,
ddply(d, .(website), transform, count = sum(user_id))
Many thanks,
baptiste
plyr and table question
6 messages · Baptiste Auguie, Tom Short, Hadley Wickham
baptiste auguie-2 wrote:
ddply(d, .(website), transform, count = table(user_id)) # why two new columns?
Try this to see why: as.data.frame(table(d$user_id)) This works more like you expect: ddply(d, .(website), transform, count = unclass(table(user_id))) - Tom
View this message in context: http://www.nabble.com/plyr-and-table-question-tp22865174p22868047.html Sent from the R help mailing list archive at Nabble.com.
On Fri, Apr 3, 2009 at 4:43 AM, baptiste auguie <ba208 at exeter.ac.uk> wrote:
Dear all,
I'm puzzled by the following example inspired by a recent question on
R-help,
cc <- textConnection("user_id ?website ? ? ? ? ?time
20 ? ? ? ?google ? ? ? ? ? ?0930
21 ? ? ? ?yahoo ? ? ? ? ? ?0935
20 ? ? ? ?facebook ? ? ? ?1000
25 ? ? ? ?facebook ? ? ? ?1015
61 ? ? ? ?google ? ? ? ? ? ?0940")
d <- read.table(cc, head=T) ; close(cc)
table(d$user_id) # count the occurrences
# now I'd like to include these results in the original data.frame,
ddply(d, .(website), transform, count = table(user_id)) # why two new
columns?
Because ddply expects a data frame as output from your aggregation function. When the output isn't a data frame, it calls as.data.frame, which in this case produces a data frame with two columns. Hadley
That makes sense, so I can do something like,
count <- function(x){
as.integer(unclass(table(x)))
}
count(d$user_id)
ddply(d, .(user_id), transform, count = count(user_id))
user_id website time count 1 20 google 930 2 2 20 facebook 1000 2 3 21 yahoo 935 1 4 25 facebook 1015 1 5 61 google 940 1
Have I missed a built-in function to obtain this result? Thanks, baptiste
On 3 Apr 2009, at 14:16, hadley wickham wrote:
On Fri, Apr 3, 2009 at 4:43 AM, baptiste auguie <ba208 at exeter.ac.uk> wrote:
Dear all,
I'm puzzled by the following example inspired by a recent question on
R-help,
cc <- textConnection("user_id website time
20 google 0930
21 yahoo 0935
20 facebook 1000
25 facebook 1015
61 google 0940")
d <- read.table(cc, head=T) ; close(cc)
table(d$user_id) # count the occurrences
# now I'd like to include these results in the original data.frame,
ddply(d, .(website), transform, count = table(user_id)) # why two new
columns?
Because ddply expects a data frame as output from your aggregation function. When the output isn't a data frame, it calls as.data.frame, which in this case produces a data frame with two columns. Hadley -- http://had.co.nz/
_____________________________ Baptiste Augui? School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag
On Fri, Apr 3, 2009 at 8:43 AM, baptiste auguie <ba208 at exeter.ac.uk> wrote:
That makes sense, so I can do something like,
count <- function(x){
? ? ? ?as.integer(unclass(table(x)))
}
count(d$user_id)
ddply(d, .(user_id), transform, count = count(user_id))
?user_id ?website time count 1 ? ? ?20 ? google ?930 ? ? 2 2 ? ? ?20 facebook 1000 ? ? 2 3 ? ? ?21 ? ?yahoo ?935 ? ? 1 4 ? ? ?25 facebook 1015 ? ? 1 5 ? ? ?61 ? google ?940 ? ? 1
Have I missed a built-in function to obtain this result?
ddply(d, .(user_id), transform, count = nrow) ? Hadley
of course! Thanks, baptiste
On 3 Apr 2009, at 14:48, hadley wickham wrote:
On Fri, Apr 3, 2009 at 8:43 AM, baptiste auguie <ba208 at exeter.ac.uk> wrote:
That makes sense, so I can do something like,
count <- function(x){
as.integer(unclass(table(x)))
}
count(d$user_id)
ddply(d, .(user_id), transform, count = count(user_id))
user_id website time count 1 20 google 930 2 2 20 facebook 1000 2 3 21 yahoo 935 1 4 25 facebook 1015 1 5 61 google 940 1
Have I missed a built-in function to obtain this result?
ddply(d, .(user_id), transform, count = nrow) ? Hadley -- http://had.co.nz/
_____________________________ Baptiste Augui? School of Physics University of Exeter Stocker Road, Exeter, Devon, EX4 4QL, UK Phone: +44 1392 264187 http://newton.ex.ac.uk/research/emag