Skip to content
Prev 44789 / 398503 Next

Suggestions ?!?!

ivo welch <ivo.welch at yale.edu> writes:
The form of the output from summary depends on the mode or class of
the column.  A numeric column is summarized by a 'five-number' summary
(min, first quartile, median, third quartile, maximum) and the mean.
If there are NA's in the column the number of NA's is reported.  The
reason that it is sometimes reported to several decimal places is
because all the values in that part of the summary are being printed
in the same format.  If the mean requires four decimal places to get
the desired number of significant digits then the number of NA's will
also be given to four decimal places.

A column that is a factor or an ordered factor will be summarized by a
(possibly truncated) frequency table.  Means, medians, etc. are not
meaningful for factors.
Three of the "secrets of the S masters" are:
  - indexing is particularly flexible and powerful in S
  - the "%in%" function is versatile and often overlooked
  - you can add a column to a data frame by assigning to that name
so three of these operations can be written as

 d[ -45, ]                     # delrow( dataframe d, index=45)
 d[ , !(names(d) %in% "name")] # delcol( dataframe d, "name")
 d[ , -col]                    # alternative form is you know the column number
 d$newcol = v                  # inscol( dataframe d, (col)vector v)
P.S. How many other people think that the next edition of MASS should
be renamed "Secrets of The S Masters"?   :-)