Dear Colleagues, I've been searching for a post or article or something which explains why having na.rm=FALSE or na.action=na.fail as the default is a better choice than TRUE or na.omit. I understand the basic argument: it does not make sense to average a nonexistance into an aggregate, and removing them implicitly leads to accidental pairwise deletion in some cases, and sum(x) / length(x) < mean(x) (which many would find disturbing)...I'm just looking for a source to cite on this issue to support mimicking R's behavior in a database system's aggregating functions (sum, avg, var, etc.). Cordially, Adam Kramer Ph.D. Candidate, Social Psychology University of Oregon adik at uoregon dot edu
Why na.rm=FALSE is the default
2 messages · Adam D. I. Kramer, Greg Snow
The only reference that I can think of (a bit subtle/indirect) is: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html (look in the section on Propagation of Blanks). But I think that it really comes down to the following 2 variations on a rule: 1. Important decisions (such as throwing away information) should be made by a human not a computer 2. Important decisions (such as throwing away information) should be made by a person familiar with the data and scientific question, not by a programmer separated in time and space from the real question who was unlikely to be able to anticipate every situation.
Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Adam D. I. Kramer > Sent: Tuesday, March 24, 2009 12:24 PM > To: r-help at r-project.org > Subject: [R] Why na.rm=FALSE is the default > > Dear Colleagues, > > I've been searching for a post or article or something which > explains why having na.rm=FALSE or na.action=na.fail as the default is > a > better choice than TRUE or na.omit. > > I understand the basic argument: it does not make sense to > average a > nonexistance into an aggregate, and removing them implicitly leads to > accidental pairwise deletion in some cases, and sum(x) / length(x) < > mean(x) > (which many would find disturbing)...I'm just looking for a source to > cite > on this issue to support mimicking R's behavior in a database system's > aggregating functions (sum, avg, var, etc.). > > Cordially, > Adam Kramer > Ph.D. Candidate, Social Psychology > University of Oregon > adik at uoregon dot edu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.