reshape2's dcast() Adds NAs to Data Frame
On Tue, 7 Aug 2012, R. Michael Weylandt wrote:
Can you provide a reproducible example? See, e.g.,
Michael, I think the attached 'sample.txt' and 'sample.cast.txt' should do. There are no missing values in sample.txt but there are in the reshaped data frame. The sequence of commands I used to generate these are:
sample <- read.table('sample.txt', header = T, sep = ',')
sample$sampdate <- as.Date(as.character(sample$sampdate))
sample$ceneq1 <- as.logical(sample$ceneq1)
str(sample)
'data.frame': 715 obs. of 8 variables: $ site : Factor w/ 5 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ... $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... $ param : Factor w/ 54 levels "AgDis","AgTot",..: 2 4 5 7 10 13 21 ... $ quant : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ... $ ceneq1 : logi TRUE FALSE FALSE FALSE TRUE FALSE ... $ floor : num 0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ... $ ceiling : num 1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ...
sample.melt <- melt(sample, id.vars = c('site', 'sampdate', 'era', 'param', 'ceneq1', 'floor', 'ceiling'))
sample.cast <- dcast(sample.melt, site + sampdate + era + ceneq1 + floor + ceiling ~ param)
str(sample.cast)
'data.frame': 668 obs. of 60 variables: $ site : Factor w/ 5 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ... $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... $ era : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ... $ ceneq1 : logi FALSE FALSE FALSE FALSE FALSE FALSE ... $ floor : num 0.00132 0.0113 0.0239 0.0253 0.0348 0.106 0.293 4.11 ... $ ceiling : num 0.00132 0.0113 0.0239 0.0253 0.0348 0.106 0.293 4.11 ... $ AgDis : num NA NA NA NA NA NA NA NA NA NA ... $ AgTot : num NA NA NA NA NA NA NA NA NA NA ... $ AlDis : num NA NA NA NA NA NA NA NA NA NA ... $ AlTot : num NA NA NA NA NA 0.106 NA NA NA NA ... etc.
dput(sample, 'sample.txt') dput(sample.cast, 'sample.cast.txt')
The context for this is my learning how to use the NADA package to plot and analyze left-censored data. The full data set has 64 site and param levels. I don't know if I can use the base data frame, the reshaped (dcast) data frame or individual subsets (one for each parameter). Rich -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sample.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120808/5cb020e3/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sample.cast.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120808/5cb020e3/attachment-0001.txt>