Help with recast() syntax
Inline below... On Mon, 28 Nov 2011 21:32:21 -0800 (PST), Chris Conner
<connerpharmd at yahoo.com> wrote:
Dear Help-Rs,
?
I have data similar to the following:
?
DF <- structure(list(X = 1:22, RESULT = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("NEG", "POS"), class = "factor"), YR_MO =
c(201011L,
201012L, 201101L, 201102L, 201103L, 201104L, 201105L, 201106L,
201107L, 201108L, 201109L, 201011L, 201012L, 201101L, 201102L,
201103L, 201104L, 201105L, 201106L, 201107L, 201108L, 201109L
), TOT_TESTS = c(66L, 98L, 109L, 122L, 113L, 111L, 113L, 146L,
124L, 130L, 120L, 349L, 393L, 376L, 371L, 396L, 367L, 406L, 383L,
394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO", "TOT_TESTS"
), class = "data.frame", row.names = c(NA, -22L))
?
Currently there are 2 observations for each month (one for negative
and one for positive test results).? What I need to create a data set
that looks like the following, with positive and negative test
results
in the same row organized by month:
?
DF2<-structure(list(X = 1:11, RESULT = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "POS", class = "factor"),
??? YR_MO = c(201011L, 201012L, 201101L, 201102L, 201103L, 201104L,
??? 201105L, 201106L, 201107L, 201108L, 201109L), POS_TESTS = c(66L,
??? 98L, 109L, 122L, 113L, 111L, 113L, 146L, 124L, 130L, 120L
??? ), NEG_TESTS = c(349L, 393L, 376L, 371L, 396L, 367L, 406L,
??? 383L, 394L, 412L, 379L)), .Names = c("X", "RESULT", "YR_MO",
"POS_TESTS", "NEG_TESTS"), class = "data.frame", row.names = c(NA,
-11L))
Thanks for the sample data.
As this is something that I understand Hadley Wickham's Reshape package is ideally suited for, I tried using the following reshape command: ? ReshapeDF <- recast(DF, YR_MO~variable) ? I get the following error message: ? Using RESULT as id variables Error: Casting formula contains variables not found in molten data: YR_MO
I don't think you need to melt the data first, so you don't need the
recast function.
# reshape2 is faster than reshape, but slightly syntactically different
library(reshape2)
# rename the RESULT levels
DF0 <- DF
levels( DF0$RESULT ) <- c( "NEG_TOTAL", "POS_TOTAL" )
# cast to data frame, use sum if more than one row for a given YR_MO
DF0 <- dcast( DF0, YR_MO~RESULT, sum, value.var="TOT_TESTS" )
# The rest of this is to make the data frame look like your result,
which seems
# unnecessary to me, but perhaps there is a good reason for keeping X
and RESULT
DF1 <- merge( DF[ DF$RESULT=="POS", c( "X", "RESULT", "YR_MO" ) ], DF0
)
DF2 <- DF1[,c("X", "RESULT", "YR_MO", "POS_TOTAL", "NEG_TOTAL" ) ]
I have a work around that allows me to get to my desired endpoint that involves splitting the data.frame into two (by test result), then using the YR_MO as the by.x/by.y in a merge, but I think this task would be handled more efficiently using reshape?? Can anyone help me to see where I'm going wrong?? Thanks in advance! [[alternative HTML version deleted]]
(Please remember that this is a plain text email list.)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go
Live...
DCN:<jdnewmil_at_dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#..
Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#.
rocks...1k