Skip to content

grubbs.test

4 messages · Dave Evens, vito muggeo, Bert Gunter +1 more

#
Dear All,

I have small samples of data (between 6 and 15) for
numerious time series points. I am assuming the data
for each time point is normally distributed. The
problem is that the data arrvies sporadically and I
would like to detect the number of outliers after I
have six data points for any time period. Essentially,
I would like to detect the number of outliers when I
have 6 data points then test whether there are any
ouliers. If so, remove the outliers, and wait until I
have at least 6 data points or when the sample size
increases and test again whether there are any
outliers. This process is repeated until there are no
more data points to add to the sample.

Is it valid to use the grubbs.test in this way?

If not, are there any tests out there that might be
appropriate for this situation? Rosner's test required
that I have at least 25 data points which I don't
have.

Thank you in advance for any help.

Dave
#
Dear Dave,
I do not know the grubbs.test (is it a function, where can I find it?) 
and probably n=6 data points are really few..

Having said that, what do you mean as "outlier"?
If you mean deviation from the estimated mean (of previous data), you 
might have a look to the strucchange package..(sorry, but now I do not 
remember the exact name of the function)

best,
vito
Dave Evens wrote:

  
    
#
The Grubbs test is one of many old (1950's - '70's) and classical tests for
outliers in linear regression. Here's a link:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

I think it fair to say that such outlier detection methods were long ago
found to be deficient and have poor statistical properties and were
supplanted by (computationally much more demanding -- but who cares these
days!?) robust/resistant techniques, at least in the more straightforward
linear models contexts. rlm() in MASS (the package) is one good
implementation of these ideas in R. See MASS (the book by V&R) for a short
but informative discussion and further references.

I should add that the use of robust/resistant techniques exposes (i.e., they
exist but we statisticians get nervous talking publicly about them) many
fundamental issues about estimation vs inference, statistical modeling
strategies, etc. The problem is that important estimation and inference
issues for R/R estimators remain to be worked out -- if, indeed, it makes
sense to think about things this way at all. For example, for various kinds
of mixed effects models, "statistical learning theory" ensemble methods,
etc. The problem, as always, is what the heck does one mean by "outlier" in
these contexts. Seems to be like pornography -- "I know it when I see it."*

Contrary views cheerfully solicited!

Cheers to all,

-- Bert Gunter

*Sorry -- that's a reference to a famous quote of Justice Potter Stewart, an
American Supreme Court Justice.
http://www.michaelariens.com/ConLaw/justices/stewart.htm
#
Dnia 2005-04-14 15:34, U?ytkownik Dave Evens napisa?:
I'm very happy that someone is interested in my new package, but I must 
worry you, that Grubbs test is probably not proper in such case. Your 
data are dependent (time series) and possibly autocorrelated. The 
outliers package is designed for testing small independent samples (for 
example results of quantitative chemical analysis), not time series data.

Regards,