Back to formatted view
Raw Message

Message-ID: <alpine.LNX.2.00.1208081124160.28482@salmo.appl-ecosys.com>
Date: 2012-08-08T18:33:39Z
From: Rich Shepard
Subject: reshape2's dcast() Adds NAs to Data Frame
In-Reply-To: <CAAmySGOQ6EOy1d1hyq3hq1aJsG9N2i_E=Lg4ETDE+KBiGrPuMw@mail.gmail.com>

On Tue, 7 Aug 2012, R. Michael Weylandt wrote:

> Can you provide a reproducible example? See, e.g.,

Michael,

   I think the attached 'sample.txt' and 'sample.cast.txt' should do. There
are no missing values in sample.txt but there are in the reshaped data
frame. The sequence of commands I used to generate these are:

> sample <- read.table('sample.txt', header = T, sep = ',')
> sample$sampdate <- as.Date(as.character(sample$sampdate))
> sample$ceneq1 <- as.logical(sample$ceneq1)
> str(sample)
'data.frame':	715 obs. of  8 variables:
  $ site    : Factor w/ 5 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ...
  $ sampdate: Date, format: "2007-12-12" "2007-12-12" ...
  $ era     : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ...
  $ param   : Factor w/ 54 levels "AgDis","AgTot",..: 2 4 5 7 10 13 21 ...
  $ quant   : num  1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ...
  $ ceneq1  : logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
  $ floor   : num  0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ...
  $ ceiling : num  1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 2.39e-02 ...
> sample.melt <- melt(sample, id.vars = c('site', 'sampdate', 'era', 'param', 'ceneq1', 'floor', 'ceiling'))
> sample.cast <- dcast(sample.melt, site + sampdate + era + ceneq1 + floor + ceiling ~ param)
> str(sample.cast)
'data.frame':	668 obs. of  60 variables:
  $ site    : Factor w/ 5 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 ...
  $ sampdate: Date, format: "2007-12-12" "2007-12-12" ...
  $ era     : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ...
  $ ceneq1  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
  $ floor   : num  0.00132 0.0113 0.0239 0.0253 0.0348 0.106 0.293 4.11 ...
  $ ceiling : num  0.00132 0.0113 0.0239 0.0253 0.0348 0.106 0.293 4.11 ...
  $ AgDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ AgTot   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ AlDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ AlTot   : num  NA NA NA NA NA 0.106 NA NA NA NA ...
etc.

> dput(sample, 'sample.txt')
> dput(sample.cast, 'sample.cast.txt')

   The context for this is my learning how to use the NADA package to plot
and analyze left-censored data. The full data set has 64 site and param
levels. I don't know if I can use the base data frame, the reshaped (dcast)
data frame or individual subsets (one for each parameter).

Rich

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sample.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120808/5cb020e3/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sample.cast.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120808/5cb020e3/attachment-0001.txt>