I have water chemistry data with censored values (i.e., those less than reporting levels) in a data frame with a narrow (i.e., database table) format. The structure is: $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 1 ... $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... $ preeq0 : logi TRUE TRUE TRUE TRUE TRUE TRUE ... $ param : Factor w/ 37 levels "Ag","Al","Alk_tot",..: 1 2 8 17 3 4 9 ... $ quant : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... $ ceneq1 : logi TRUE FALSE TRUE FALSE FALSE FALSE ... $ floor : num 0 0.106 0 231 231 0.011 0 0 0 100 ... $ ceiling : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... The logical 'preeq0' separates sampdate into two groups; 'ceneq1' indicates censored/uncensored values; 'floor' and 'ceiling' are the minima and maxima for censored values. The NADA package methods will be used, but I have not found information on whether this format or the wide (i.e., spreadsheet) format should be used. The NADA.pdf document doesn't tell me; at least, I haven't found the answer there. I can apply reshape2 to melt and re-cast the data in wide format if that's what is appropriate. Please provide a pointer to documents I can read for an answer to this and related questions. Rich
NADA Data Frame Format: Wide or Long?
6 messages · Jean V Adams, Rich Shepard, MacQueen, Don
2 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120705/4424158e/attachment.pl>
I haven't used NADA functions in quite a while, but from what I recall, you will likely be using the "narrow" format, and sub-setting as needed for the different analytes. As Jean suggested, the examples in the help pages for the NADA function(s) of interest should make it clear. This example follows exactly the example in ?cenros. with( subset(yourdataframe, param=='Ag'), cenros(quant,ceneq1) ) This should do a simple censored summary statistica calculation for silver (assuming quant contains your reporting level for censored results, which appears to be the case). I'd also suggest you try to load your data so that site and param are not factors, though this could depend on your ultimate analysis. -Don
Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/3/12 9:57 AM, "Rich Shepard" <rshepard at appl-ecosys.com> wrote: > I have water chemistry data with censored values (i.e., those less than >reporting levels) in a data frame with a narrow (i.e., database table) >format. The structure is: > > $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 1 >... > $ sampdate: Date, format: "2007-12-12" "2007-12-12" ... > $ preeq0 : logi TRUE TRUE TRUE TRUE TRUE TRUE ... > $ param : Factor w/ 37 levels "Ag","Al","Alk_tot",..: 1 2 8 17 3 4 9 >... > $ quant : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... > $ ceneq1 : logi TRUE FALSE TRUE FALSE FALSE FALSE ... > $ floor : num 0 0.106 0 231 231 0.011 0 0 0 100 ... > $ ceiling : num 0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ... > > The logical 'preeq0' separates sampdate into two groups; 'ceneq1' >indicates censored/uncensored values; 'floor' and 'ceiling' are the minima >and maxima for censored values. > > The NADA package methods will be used, but I have not found >information on >whether this format or the wide (i.e., spreadsheet) format should be used. >The NADA.pdf document doesn't tell me; at least, I haven't found the >answer >there. I can apply reshape2 to melt and re-cast the data in wide format if >that's what is appropriate. Please provide a pointer to documents I can >read >for an answer to this and related questions. > >Rich > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Thu, 5 Jul 2012, MacQueen, Don wrote:
This example follows exactly the example in ?cenros.
with( subset(yourdataframe, param=='Ag'), cenros(quant,ceneq1) )
This should do a simple censored summary statistica calculation for silver (assuming quant contains your reporting level for censored results, which appears to be the case).
Don, That makes sense to me. I was hoping to avoid subsetting the data frame for each of the 37 chemical parameters, but ... I will review the use of with().
I'd also suggest you try to load your data so that site and param are not factors, though this could depend on your ultimate analysis.
I do need to differentiate results by site and chemical paramater. Many thanks, Rich
1 day later
Hi Rich, So what you're faced with is that the cenros() function has no built-in methods for grouping or subsetting -- unlike some other R methods, especially those that work with the lattice package, or the many modeling functions like lm() that have a subset argument or employ a conditioning syntax for models [like y ~ x | g ]. In effect, this means you have to roll your own. The wide format could help, but you would still probably end up writing loops. Each parameter would then presumably be represented by two columns, one for the result, one for non-detection indicator. And they would all have different names, such as ceneq1.ag, ceneq1.al, and so on. I think you'd probably end up with more complicated scripts. This approach is especially tricky if not all analtyes and locations were sampled on the same days (which is normally the case for my data). You're probably aware that there are various functions for splitting a dataframe into subsets and then applying the same function to every subset, such as by() and aggregate(), and probably others. These may turn out to be fairly simple to use with a NADA function such as cenros(), but you won't really know until you start trying them. One can also do it oneself with constructs like tmpsub <- split( mydf, list(mydf$site, mydf$param) ) tmpss <- lapply(tmpsub, myfun) where myfun is a wrapper function around, say, cenros(). This is obviously just an outline. -Don
Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 7/5/12 1:15 PM, "Rich Shepard" <rshepard at appl-ecosys.com> wrote: >On Thu, 5 Jul 2012, MacQueen, Don wrote: > >> This example follows exactly the example in ?cenros. > >> with( subset(yourdataframe, param=='Ag'), cenros(quant,ceneq1) ) > >> This should do a simple censored summary statistica calculation for >>silver >> (assuming quant contains your reporting level for censored results, >>which >> appears to be the case). > >Don, > > That makes sense to me. I was hoping to avoid subsetting the data frame >for each of the 37 chemical parameters, but ... I will review the use of >with(). > >> I'd also suggest you try to load your data so that site and param are >>not >> factors, though this could depend on your ultimate analysis. > > I do need to differentiate results by site and chemical paramater. > >Many thanks, > >Rich > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On Fri, 6 Jul 2012, MacQueen, Don wrote:
So what you're faced with is that the cenros() function has no built-in methods for grouping or subsetting -- unlike some other R methods, especially those that work with the lattice package, or the many modeling functions like lm() that have a subset argument or employ a conditioning syntax for models [like y ~ x | g ]. In effect, this means you have to roll your own.
...
This is obviously just an outline.
Don, Yes, I do need to work out how best to address the needs of two current projects. This will take some time as I've not before needed to work with censored data (regulators tend to focus on only threshold exceedences). As a result, I'll be re-reading Dennis's book (the second edition) and the NADA.pdf, defining exactly what I need, trying various approaches, and certainly coming back here for advice and suggestions. One thing I noticed a couple of days ago is that I could use cenboxplot() for chemical concentrations by period, but could not figure out how to change the x axis lables from Pre and Post (or Before and After, I forget the exact terms) to more meaningful terms. This will be an interesting journey and a great education for me. Thanks, Rich
Richard B. Shepard, Ph.D. | Integrity - Credibility - Innovation Applied Ecosystem Services, Inc. | Helping Ensure Our Clients' Futures <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863