----------------------------------------------------------------------
Message: 1
Date: Tue, 15 May 2012 11:15:39 -0700 (PDT)
From: Rich Shepard<rshepard at appl-ecosys.com>
To: r-sig-ecology at r-project.org
Subject: [R-sig-eco] Continuous (Non-Count) Skewed Data With Many
Zeros
Message-ID:<alpine.LNX.2.00.1205151057550.3824 at salmo.appl-ecosys.com>
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
The water chemistry data of metal concentrations are not normally
distributed (based on Q-Q plots) and are not improved by transformation
(log10, sqrt, cubic root).
Non-normality of your response variable is not a reason to apply a data transformation.
For the 30 metal species the percentage of zeros
ranges from none (10 metals) to 48.6; average 5.6. Most metals are at very
low concentrations with infrequent spikes which might be very high.
Those with fewer zeros are not a concern, but I'd like your thoughts on 1)
at what percentage do the number of zeros become a concern
It all depends, and no sensible answer can be given. 15% of zeros can screw things up....but it is also possible that 80% of zeros comply with a regression or GLM. For a discussion with examples see Chapter 10 in our 2012 book.
and 2) how to characterize and model these data.
Depends on the previous remark.....anything from linear regression to a zero inflated model for a continuous distributed response variable. There is just no simple answer possible. It all depends. But based on what you describe it will probably be something zero-inflated. Alain
Dr. Alain F. Zuur First author of: 1. Analysing Ecological Data (2007). Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p. URL: www.springer.com/0-387-45967-7 2. Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer. http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9 3. A Beginner's Guide to R (2009). Zuur, AF, Ieno, EN, Meesters, EHWG. Springer http://www.springer.com/statistics/computational/book/978-0-387-93836-3 4. Zero Inflated Models and Generalized Linear Mixed Models with R. (2012) Zuur, Saveliev, Ieno. http://www.highstat.com/book4.htm Other books: http://www.highstat.com/books.htm Statistical consultancy, courses, data analysis and software Highland Statistics Ltd. 6 Laverock road UK - AB41 6FN Newburgh Tel: 0044 1358 788177 Email: highstat at highstat.com URL: www.highstat.com URL: www.brodgar.com