Skip to content

Why na.rm=FALSE is the default

2 messages · Adam D. I. Kramer, Greg Snow

#
Dear Colleagues,

 	I've been searching for a post or article or something which
explains why having na.rm=FALSE or na.action=na.fail as the default is a
better choice than TRUE or na.omit.

 	I understand the basic argument: it does not make sense to average a
nonexistance into an aggregate, and removing them implicitly leads to
accidental pairwise deletion in some cases, and sum(x) / length(x) < mean(x)
(which many would find disturbing)...I'm just looking for a source to cite
on this issue to support mimicking R's behavior in a database system's
aggregating functions (sum, avg, var, etc.).

Cordially,
Adam Kramer
Ph.D. Candidate, Social Psychology
University of Oregon
adik at uoregon dot edu
#
The only reference that I can think of (a bit subtle/indirect) is: http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html (look in the section on Propagation of Blanks).

But I think that it really comes down to the following 2 variations on a rule:

1. Important decisions (such as throwing away information) should be made by a human not a computer
2. Important decisions (such as throwing away information) should be made by a person familiar with the data and scientific question, not by a programmer separated in time and space from the real question who was unlikely to be able to anticipate every situation.