Skip to content

NADA Data Frame Format: Wide or Long?

6 messages · Jean V Adams, Rich Shepard, MacQueen, Don

#
I have water chemistry data with censored values (i.e., those less than
reporting levels) in a data frame with a narrow (i.e., database table)
format. The structure is:

  $ site    : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 1 1 ...
  $ sampdate: Date, format: "2007-12-12" "2007-12-12" ...
  $ preeq0  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
  $ param   : Factor w/ 37 levels "Ag","Al","Alk_tot",..: 1 2 8 17 3 4 9 ...
  $ quant   : num  0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ...
  $ ceneq1  : logi  TRUE FALSE TRUE FALSE FALSE FALSE ...
  $ floor   : num  0 0.106 0 231 231 0.011 0 0 0 100 ...
  $ ceiling : num  0.005 0.106 1 231 231 0.011 0.001 0.002 0.001 100 ...

   The logical 'preeq0' separates sampdate into two groups; 'ceneq1'
indicates censored/uncensored values; 'floor' and 'ceiling' are the minima
and maxima for censored values.

   The NADA package methods will be used, but I have not found information on
whether this format or the wide (i.e., spreadsheet) format should be used.
The NADA.pdf document doesn't tell me; at least, I haven't found the answer
there. I can apply reshape2 to melt and re-cast the data in wide format if
that's what is appropriate. Please provide a pointer to documents I can read
for an answer to this and related questions.

Rich
2 days later
#
I haven't used NADA functions in quite a while, but from what I recall,
you will likely be using the "narrow" format, and sub-setting as needed
for the different analytes.

As Jean suggested, the examples in the help pages for the NADA function(s)
of interest should make it clear.

This example follows exactly the example in ?cenros.

  with( subset(yourdataframe, param=='Ag'),  cenros(quant,ceneq1) )

This should do a simple censored summary statistica calculation for silver
(assuming quant contains your reporting level for censored results, which
appears to be the case).

I'd also suggest you try to load your data so that site and param are not
factors, though this could depend on your ultimate analysis.

-Don
#
On Thu, 5 Jul 2012, MacQueen, Don wrote:

            
Don,

   That makes sense to me. I was hoping to avoid subsetting the data frame
for each of the 37 chemical parameters, but ... I will review the use of
with().
I do need to differentiate results by site and chemical paramater.

Many thanks,

Rich
1 day later
#
Hi Rich,

So what you're faced with is that the cenros() function has no built-in
methods for grouping or subsetting -- unlike some other R methods,
especially those that work with the lattice package, or the many modeling
functions like lm() that have a subset argument or employ a conditioning
syntax for models [like  y ~ x | g ]. In effect, this means you have to
roll your own.

The wide format could help, but you would still probably end up writing
loops. Each parameter would then presumably be represented by two columns,
one for the result, one for non-detection indicator. And they would all
have different names, such as ceneq1.ag, ceneq1.al, and so on. I think
you'd probably end up with more complicated scripts. This approach is
especially tricky if not all analtyes and locations were sampled on the
same days (which is normally the case for my data).

You're probably aware that there are various functions for splitting a
dataframe into subsets and then applying the same function to every
subset, such as by() and aggregate(), and probably others. These may turn
out to be fairly simple to use with a NADA function such as cenros(), but
you won't really know until you start trying them.

One can also do it oneself with constructs like

tmpsub <- split( mydf, list(mydf$site, mydf$param) )
tmpss <- lapply(tmpsub, myfun)

where myfun is a wrapper function around, say, cenros().

This is obviously just an outline.

-Don
#
On Fri, 6 Jul 2012, MacQueen, Don wrote:

            
...
Don,

   Yes, I do need to work out how best to address the needs of two current
projects. This will take some time as I've not before needed to work with
censored data (regulators tend to focus on only threshold exceedences). As a
result, I'll be re-reading Dennis's book (the second edition) and the
NADA.pdf, defining exactly what I need, trying various approaches, and
certainly coming back here for advice and suggestions.

   One thing I noticed a couple of days ago is that I could use cenboxplot()
for chemical concentrations by period, but could not figure out how to
change the x axis lables from Pre and Post (or Before and After, I forget
the exact terms) to more meaningful terms.

   This will be an interesting journey and a great education for me.

Thanks,

Rich