Dear All, I have small samples of data (between 6 and 15) for numerious time series points. I am assuming the data for each time point is normally distributed. The problem is that the data arrvies sporadically and I would like to detect the number of outliers after I have six data points for any time period. Essentially, I would like to detect the number of outliers when I have 6 data points then test whether there are any ouliers. If so, remove the outliers, and wait until I have at least 6 data points or when the sample size increases and test again whether there are any outliers. This process is repeated until there are no more data points to add to the sample. Is it valid to use the grubbs.test in this way? If not, are there any tests out there that might be appropriate for this situation? Rosner's test required that I have at least 25 data points which I don't have. Thank you in advance for any help. Dave
grubbs.test
4 messages · Dave Evens, vito muggeo, Bert Gunter +1 more
Dear Dave, I do not know the grubbs.test (is it a function, where can I find it?) and probably n=6 data points are really few.. Having said that, what do you mean as "outlier"? If you mean deviation from the estimated mean (of previous data), you might have a look to the strucchange package..(sorry, but now I do not remember the exact name of the function) best, vito
Dave Evens wrote:
Dear All, I have small samples of data (between 6 and 15) for numerious time series points. I am assuming the data for each time point is normally distributed. The problem is that the data arrvies sporadically and I would like to detect the number of outliers after I have six data points for any time period. Essentially, I would like to detect the number of outliers when I have 6 data points then test whether there are any ouliers. If so, remove the outliers, and wait until I have at least 6 data points or when the sample size increases and test again whether there are any outliers. This process is repeated until there are no more data points to add to the sample. Is it valid to use the grubbs.test in this way? If not, are there any tests out there that might be appropriate for this situation? Rosner's test required that I have at least 25 data points which I don't have. Thank you in advance for any help. Dave
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
==================================== Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Universit? di Palermo viale delle Scienze, edificio 13 90121 Palermo - ITALY tel: 091 6626240 fax: 091 485726/485612
The Grubbs test is one of many old (1950's - '70's) and classical tests for outliers in linear regression. Here's a link: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm I think it fair to say that such outlier detection methods were long ago found to be deficient and have poor statistical properties and were supplanted by (computationally much more demanding -- but who cares these days!?) robust/resistant techniques, at least in the more straightforward linear models contexts. rlm() in MASS (the package) is one good implementation of these ideas in R. See MASS (the book by V&R) for a short but informative discussion and further references. I should add that the use of robust/resistant techniques exposes (i.e., they exist but we statisticians get nervous talking publicly about them) many fundamental issues about estimation vs inference, statistical modeling strategies, etc. The problem is that important estimation and inference issues for R/R estimators remain to be worked out -- if, indeed, it makes sense to think about things this way at all. For example, for various kinds of mixed effects models, "statistical learning theory" ensemble methods, etc. The problem, as always, is what the heck does one mean by "outlier" in these contexts. Seems to be like pornography -- "I know it when I see it."* Contrary views cheerfully solicited! Cheers to all, -- Bert Gunter *Sorry -- that's a reference to a famous quote of Justice Potter Stewart, an American Supreme Court Justice. http://www.michaelariens.com/ConLaw/justices/stewart.htm
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of vito muggeo Sent: Thursday, April 14, 2005 7:05 AM To: Dave Evens Cc: r-help at stat.math.ethz.ch Subject: Re: [R] grubbs.test Dear Dave, I do not know the grubbs.test (is it a function, where can I find it?) and probably n=6 data points are really few.. Having said that, what do you mean as "outlier"? If you mean deviation from the estimated mean (of previous data), you might have a look to the strucchange package..(sorry, but now I do not remember the exact name of the function) best, vito Dave Evens wrote:
Dear All, I have small samples of data (between 6 and 15) for numerious time series points. I am assuming the data for each time point is normally distributed. The problem is that the data arrvies sporadically and I would like to detect the number of outliers after I have six data points for any time period. Essentially, I would like to detect the number of outliers when I have 6 data points then test whether there are any ouliers. If so, remove the outliers, and wait until I have at least 6 data points or when the sample size increases and test again whether there are any outliers. This process is repeated until there are no more data points to add to the sample. Is it valid to use the grubbs.test in this way? If not, are there any tests out there that might be appropriate for this situation? Rosner's test required that I have at least 25 data points which I don't have. Thank you in advance for any help. Dave
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html -- ==================================== Vito M.R. Muggeo Dip.to Sc Statist e Matem `Vianelli' Universit? di Palermo viale delle Scienze, edificio 13 90121 Palermo - ITALY tel: 091 6626240 fax: 091 485726/485612 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Dnia 2005-04-14 15:34, U?ytkownik Dave Evens napisa?:
Is it valid to use the grubbs.test in this way?
I'm very happy that someone is interested in my new package, but I must worry you, that Grubbs test is probably not proper in such case. Your data are dependent (time series) and possibly autocorrelated. The outliers package is designed for testing small independent samples (for example results of quantitative chemical analysis), not time series data. Regards,
Lukasz Komsta Department of Medicinal Chemistry Medical University of Lublin 6 Chodzki, 20-093 Lublin, Poland Fax +48 81 7425165