Rich, Your specific question was "whether the geometric mean of a set of chemical concentrations (e.g., in mg/L) is an appropriate representation of the expected value." The short answer to that specific question is no. There are various estimators of the expected value of a positively skewed distribution: 1) The arithmetic average 2) Assuming a specific distribution, e.g., log normal ( most common but not restricted to that), and applying the distribution-specific relationship between parameters and expected value. 3) A non-parametric, e.g. "Duan" estimator (very rare). However, I suggest your question is too narrow. The better question is "what single value best characterizes the location of a skewed distribution"? If the skew is sufficiently large (e.g. large log-scale sd), 90 - 99% of the population can be less than the expected value. In my mind, the answer to this question depends on what you "want to do" with the estimated location. 1) summarize the observations - I suggest the median is the best estimator of the typical value for an individual observation. For log normal data, the geometric mean is an estimate of the median, because the log operator and median operator can be interchanged algebraically. 2) estimate a total, e.g. over space or over time. This is the relevant goal when trying to estimate nutrient loading into a receiving body or total release of a contaminant. For this, you do want the expected value. I have seen instances where load/release was estimated by multiplying the geometric mean daily release times the number of days. Justified because "the data are skewed so we computed the geometric mean". I'll leave them nameless because the geometric mean computation is totally wrong for this goal (although favorable to the polluter because the answer is a smaller value than the arithmetic average times number of days). Best, Philip Dixon
R-sig-ecology Digest, Vol 190, Issue 8
1 message · Dixon, Philip M [STAT]