An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100130/6f2e4dc5/attachment.pl>
aggregate by factor
5 messages · David Winsemius, david hilton shanabrook, Dennis Murphy
On Jan 30, 2010, at 4:09 PM, david hilton shanabrook wrote:
I have a data frame with two columns, a factor and a numeric. I want to create data frame with the factor, its frequency and the median of the numeric column
head(motifList)
events score 1 aeijm -0.25000000 2 begjm -0.25000000 3 afgjm -0.25000000 4 afhjm -0.25000000 5 aeijm -0.25000000 6 aehjm 0.08333333 To get the frequency table of events:
motifTable <- as.data.frame(table(motifList$events)) head(motifTable)
Var1 Freq 1 aeijm 110 2 begjm 46 3 afgjm 337 4 afhjm 102 5 aehjm 190 6 adijm 18
Now get the score column back in.
motifTable2 <- merge(motifList, motifTable, by="events") head(motifTable2)
events percent freq 1 adgjm 0.00000000 111 2 adgjm NA 111 3 adgjm 0.13333333 111 4 adgjm 0.06666667 111 5 adgjm -0.16666667 111 6 adgjm NA 111
Then lastly to aggregate on the events column getting the median of the score
motifTable3 <- aggregate.data.frame(motifTable2, by=list(motifTable2$events), FUN=median, na.rm=TRUE)
Error in median.default(X[[1L]], ...) : need numeric data Which gives the error as events are a factor. Can someone enlighten me to a more obvious approach?
I don't think grouping on a factor is the source of your error. You have NA's in your data and median will choke on those unless you specify na.rm=TRUE.
David Winsemius, MD Heritage Laboratories West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100130/f5418f17/attachment.pl>
On Jan 30, 2010, at 4:46 PM, david hilton shanabrook wrote:
On 30 Jan 2010, at 4:20 PM, David Winsemius wrote:
On Jan 30, 2010, at 4:09 PM, david hilton shanabrook wrote:
I have a data frame with two columns, a factor and a numeric. I want to create data frame with the factor, its frequency and the median of the numeric column
head(motifList)
events score 1 aeijm -0.25000000 2 begjm -0.25000000 3 afgjm -0.25000000 4 afhjm -0.25000000 5 aeijm -0.25000000 6 aehjm 0.08333333 To get the frequency table of events:
motifTable <- as.data.frame(table(motifList$events)) head(motifTable)
Var1 Freq 1 aeijm 110 2 begjm 46 3 afgjm 337 4 afhjm 102 5 aehjm 190 6 adijm 18
Now get the score column back in.
motifTable2 <- merge(motifList, motifTable, by="events") head(motifTable2)
events percent freq 1 adgjm 0.00000000 111 2 adgjm NA 111 3 adgjm 0.13333333 111 4 adgjm 0.06666667 111 5 adgjm -0.16666667 111 6 adgjm NA 111
Then lastly to aggregate on the events column getting the median of the score
motifTable3 <- aggregate.data.frame(motifTable2, by=list(motifTable2$events), FUN=median, na.rm=TRUE)
Error in median.default(X[[1L]], ...) : need numeric data Which gives the error as events are a factor. Can someone enlighten me to a more obvious approach?
I don't think grouping on a factor is the source of your error. You have NA's in your data and median will choke on those unless you specify na.rm=TRUE. --
I thought the na.rm=TRUE in the aggregate function would do this (see above). I also tried it with
I missed that.
medianRmNa <- function(data) {
return(median(data, na.rm=TRUE))}
motifTable3 <- aggregate.data.frame(motifTable2, by=list(motifTable2$events), FUN=medianRmNa)
Error in median.default(data, na.rm = TRUE) : need numeric data
Apparently you cannot include the grouping variable in the first argument to aggregate: motifTable3 <- aggregate(motifTable2[ , -1], by=list(motifTable2$events), FUN=median, na.rm=TRUE) > motifTable3 Group.1 score freq 1 aehjm 0.08333333 1 2 aeijm -0.25000000 2 3 afgjm -0.25000000 1 4 afhjm -0.25000000 1 5 begjm -0.25000000 1
same error.
I did leave a line out of the above script,
names(motifTable) <- c("events", "freq")
which helps explain why the merge works
dhs
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Heritage Laboratories West Hartford, CT
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100130/b76880f3/attachment.pl>