Working with 5 subset streams from my source data frame, three of them
successfully call dcast(), but two fail:
jerritt.cast <- dcast(jerritt.melt, site + sampdate ~ param)
Aggregation function missing: defaulting to length
and
winters.cast <- dcast(winters.melt, site + sampdate ~ param)
Aggregation function missing: defaulting to length
Yet both data frames have the values in their .melt data frames:
summary(jerritt.melt)
site sampdate param variable
JCM-1 :2178 Min. :1978-03-28 pH : 292 quant:7519
JCM-20A:2149 1st Qu.:1996-05-24 As : 286
JC-E : 476 Median :2000-05-31 SO4 : 271
JC : 400 Mean :2001-02-04 TDS : 271
GD-1 : 395 3rd Qu.:2006-05-31 Cl : 253
JC-2 : 349 Max. :2009-12-30 Zn : 250
(Other):1572 (Other):5896
value
Min. : 0.000
1st Qu.: 0.005
Median : 0.650
Mean : 317.588
3rd Qu.: 27.000
Max. :20450.000
NA's : 2134.000
and
summary(winters.melt)
site sampdate param variable
WC :601 Min. :1987-07-23 As : 96 quant:1189
WC-2 :327 1st Qu.:1994-06-15 TDS : 79
WC-1 :261 Median :1995-07-27 NO3-N : 74
BC-0.5 : 0 Mean :1997-05-15 pH : 72
BC-1 : 0 3rd Qu.:1996-07-29 SO4 : 69
BC-1.5 : 0 Max. :2011-06-06 Cl : 64
(Other): 0 (Other):735
value
Min. : 0.00
1st Qu.: 0.05
Median : 7.59
Mean : 79.20
3rd Qu.: 75.00
Max. :2587.00
NA's : 252.00
What might be causing dcast() to fail with these two data frames while it
succeeds with three others processed using the same syntax? If additional
information would help, let me know and I'll provide it.
Puzzled,
Rich
reshape2: Lost Values Between melt() and dcast()
5 messages · Justin Haynes, Rich Shepard
The reason dcast would give that warning (not a failure) is if the formula you gave did not specify unique values. Thus, dcast needs an aggregating function, which defaults to length. However, the dcast calls that "failed" can be helpful for determining the source of your error. I'd look at the outputs of those two dcast calls and find cells where the length is > 1. Those are duplicated entries in your initial data.frames (when I've run into this is was usually due to NA values somewhere unexpected). Hope that clarifies things. Justin
On Mon, Oct 31, 2011 at 9:32 AM, Rich Shepard <rshepard at appl-ecosys.com> wrote:
?Working with 5 subset streams from my source data frame, three of them successfully call dcast(), but two fail: jerritt.cast <- dcast(jerritt.melt, site + sampdate ~ param) Aggregation function missing: defaulting to length and winters.cast <- dcast(winters.melt, site + sampdate ~ param) Aggregation function missing: defaulting to length ?Yet both data frames have the values in their .melt data frames: summary(jerritt.melt) ? ? ?site ? ? ? ? sampdate ? ? ? ? ? ? ?param ? ? ? variable ?JCM-1 ?:2178 ? Min. ? :1978-03-28 ? pH ? ? : 292 ? quant:7519 ?JCM-20A:2149 ? 1st Qu.:1996-05-24 ? As ? ? : 286 ?JC-E ? : 476 ? Median :2000-05-31 ? SO4 ? ?: 271 ?JC ? ? : 400 ? Mean ? :2001-02-04 ? TDS ? ?: 271 ?GD-1 ? : 395 ? 3rd Qu.:2006-05-31 ? Cl ? ? : 253 ?JC-2 ? : 349 ? Max. ? :2009-12-30 ? Zn ? ? : 250 ?(Other):1572 ? ? ? ? ? ? ? ? ? ? ? ?(Other):5896 ? ? value ?Min. ? : ? ?0.000 ?1st Qu.: ? ?0.005 ?Median : ? ?0.650 ?Mean ? : ?317.588 ?3rd Qu.: ? 27.000 ?Max. ? :20450.000 ?NA's ? : 2134.000 and summary(winters.melt) ? ? ?site ? ? ? ?sampdate ? ? ? ? ? ? ?param ? ? ?variable ?WC ? ? :601 ? Min. ? :1987-07-23 ? As ? ? : 96 ? quant:1189 ?WC-2 ? :327 ? 1st Qu.:1994-06-15 ? TDS ? ?: 79 ?WC-1 ? :261 ? Median :1995-07-27 ? NO3-N ?: 74 ?BC-0.5 : ?0 ? Mean ? :1997-05-15 ? pH ? ? : 72 ?BC-1 ? : ?0 ? 3rd Qu.:1996-07-29 ? SO4 ? ?: 69 ?BC-1.5 : ?0 ? Max. ? :2011-06-06 ? Cl ? ? : 64 ?(Other): ?0 ? ? ? ? ? ? ? ? ? ? ? ?(Other):735 ? ? value ?Min. ? : ? 0.00 ?1st Qu.: ? 0.05 ?Median : ? 7.59 ?Mean ? : ?79.20 ?3rd Qu.: ?75.00 ?Max. ? :2587.00 ?NA's ? : 252.00 ?What might be causing dcast() to fail with these two data frames while it succeeds with three others processed using the same syntax? If additional information would help, let me know and I'll provide it. Puzzled, Rich
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mon, 31 Oct 2011, Justin Haynes wrote:
However, the dcast calls that "failed" can be helpful for determining the source of your error. I'd look at the outputs of those two dcast calls and find cells where the length is > 1. Those are duplicated entries in your initial data.frames (when I've run into this is was usually due to NA values somewhere unexpected).
Justin, I'll have to dig in the docs to see how to examine specific rows in the original data frames because I cannot find where duplicate entries were generated. In the dcast() results for the two problem data frames I found 1 row with a value of 2 in one and 8 rows each with a value of 2 in the other. When I look at the original database table, only one row is present for each of the 9. There are about 47.5K rows in the original R data frame so going through them one at a time is a problem. Have you any suggestion on how to examine the data frame and the melted data frame to see where the problems might be? Thanks, Rich
On Mon, 31 Oct 2011, Justin Haynes wrote:
I'd look at the outputs of those two dcast calls and find cells where the length is > 1. Those are duplicated entries in your initial data.frames (when I've run into this is was usually due to NA values somewhere unexpected).
The dcast() resulting data frame has one row with a '2' in one column. However the melt() data frame has only one row with that combination of site, sampdate, and param. The problem is that the melt(), and the chemdata data frames show the quant value as 'NA' while the original database table has the value of 1.0 for that site, sampdate, and param. I'll re-read the table and see if that fixes the issue with this one subset data frame. Curious how the database table has a value of 1.00 mg/L and the read data frame contains NA. More curious is why the cast() data frame has a '2' for that row. Rich
On Mon, 31 Oct 2011, Rich Shepard wrote:
Curious how the database table has a value of 1.00 mg/L and the read data frame contains NA. More curious is why the cast() data frame has a '2' for that row.
Further searching in emacs of the text file generated by write.text() I found two rows for the same values in the columns site, sampdate, and param. Since a select query on the database table returns only one row I cannot explain how the R data frame has 2 rows. Regardless, thanks to Justin's suggestions, I've fixed one subset data frame and will now fix the other. Rich