An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090214/4c52ff30/attachment-0001.pl>
implementing Grubbs outlier test on a large dataframe
3 messages · John Malone, David Winsemius, Frank E Harrell Jr
Sending each row of a datatframe, dfm, as a vector to a function, fcn, is as simple as; apply(dfm, 1, fcn) e.g.: > dfm <- data.frame(x=rnorm(10), y=rnorm(10), z=rnorm(10)) > > apply(dfm, 1, sum) [1] 0.7385838 -3.1819193 0.3415670 -0.6552601 -1.3470174 -0.6446259 -0.6544967 [8] 0.1778169 -0.3330527 0.6246071 And with the second argument set to 2, you would get a columnwise application of the function. You need to show us what your function looks like to go any further. I am unclear how one could get a function that only operates on a single row to yield an outlier classification.
David Winsemius On Feb 14, 2009, at 6:01 PM, John Malone wrote: > Hi! > > I'm trying to implement an outlier test once/row in a large dataframe. > Ideally, I'd do this then add the Pvalue results and the number > flagged as > an outlier as two new separate columns to the dataframe. Grubbs > outlier > test requires a vector and I'm confused how to make each row of my > dataframe > a vector, followed by doing a Grubbs test for each row containing > the vector > of numbers I want to perform the outlier test on. > > I'm new to R and no doubt this is a simple problem. Any help you might > provide would be greatly appreciated. > > Many thanks in advance!! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
John Malone wrote:
Hi! I'm trying to implement an outlier test once/row in a large dataframe. Ideally, I'd do this then add the Pvalue results and the number flagged as an outlier as two new separate columns to the dataframe. Grubbs outlier test requires a vector and I'm confused how to make each row of my dataframe a vector, followed by doing a Grubbs test for each row containing the vector of numbers I want to perform the outlier test on. I'm new to R and no doubt this is a simple problem. Any help you might provide would be greatly appreciated. Many thanks in advance!! [[alternative HTML version deleted]]
John - you would be making a strong normality assumption. You might
reject H0 using Grubbs' test just because of non-normality, or you might
fail to reject it just because of non-normality. Is it really this
straitforward to declare something an outlier? What does outlier really
mean?
The following is must reading.
@Article{fin06cal,
author = {Finney, David J.},
title = {Calibration guidelines challenge outlier
practices},
journal = The American Statistician,
year = 2006,
volume = 60,
pages = {309-313},
annote = {anticoagulant
therapy;bias;causation;ethics;objectivity;outliers;guidelines for
treatment of outliers;overview of types of outliers;letter to the editor
and reply 61:187 May 2007}
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University