Skip to content

suggestion "suggestion" and dataframe operations

4 messages · ivo welch, Douglas Bates, Jason Turner

#
hi chaps:

a simple suggestion:  R tells me who the contributors() are, but this 
should also tell me where I should mail suggestions to.  Is it this 
mailing list?  a repository of suggestions?  an individual?

this came up because i wanted to suggest two small enhancements:

the first is for the summary() method for plain data frames.  it would 
seem to me that the number of "NA"s should be printed as an integer, not 
necessarily in scientific notation.  I have also yet to determine when 
summary() likes to give means and when it does not.  (maybe it was an 
older version that sometimes did not give means).  summary does not seem 
to have optional parameters to specify what statistics I would like. 
this could be useful, too.

another small enhancement:  there are four elementary data frame 
operations that bedevil novices, so they really should have named 
function wrappers:

	delrow( dataframe d, index=45);
	insrow( dataframe d, (row)vector v);
	delcol( dataframe d, "name");
	inscol( dataframe d, (col)vector v);

I looked at my R "bible" (venables&ripley), too, but here too it is not 
as clear as it needs to be.  yes, this may be programmable, but it ain't 
as obvious as it should be for beginners.

regards,

/iaw
11 days later
#
hi chaps:

* I have some suggestion, the first of which is about suggestions, R 
tells me who the contributors() are, but this should also tell me where 
I should email suggestions to.  Is it this mailing address/list?  a 
repository of suggestions?  an individual?

this came up because i wanted to suggest enhancements:


* the first is for the summary() method for plain data frames.  it would 
seem to me that the number of "NA" observations should be printed as an 
integer, not necessarily in scientific notation.  I have also yet to 
determine when summary() likes to give means and when it does not. 
(maybe it was an older version that sometimes did not give means). 
summary does not seem to have optional parameters to specify what 
statistics I would like. this could be useful, too.


* another small enhancement:  there are four elementary data frame 
operations that bedevil novices, so they really should have named 
function wrappers:

     delrow( dataframe d, index=45);
     insrow( dataframe d, (row)vector v);
     delcol( dataframe d, "name");
     inscol( dataframe d, (col)vector v);

Even a simple alias would do (maybe named row.delete, column.delete).  I 
looked at my R "bible" (venables&ripley), too, but here too it is not as 
clear as it needs to be.  yes, these operations are programmable, but it 
ain't as obvious as it should be for beginners.  these are elementary.


* Finally, a more complex question: I have a historical rate of stock 
return series (yes, I teach finance).  I would like to make a ts plot on 
the left (plot(date,returns,type="h")), and a plot(density(returns)) on 
the right.  works nicely with par(mfrow=c(1,2)), but it would be even 
nicer if I could rotate the density plot 90 degrees, so that it is more 
apparent that the density plot is an aggregation of the points at the 
same y coordinates.  (if need be, a histogram could replace the density 
plot.)  Is it possible to rotate an entire subpanel figure.  if there 
was a "horizontal" parameter to ps.options for plot(), it would do the 
trick, but this does not work.   So, this may be a suggestion, too.

regards,

/iaw
#
ivo welch <ivo.welch at yale.edu> writes:
The form of the output from summary depends on the mode or class of
the column.  A numeric column is summarized by a 'five-number' summary
(min, first quartile, median, third quartile, maximum) and the mean.
If there are NA's in the column the number of NA's is reported.  The
reason that it is sometimes reported to several decimal places is
because all the values in that part of the summary are being printed
in the same format.  If the mean requires four decimal places to get
the desired number of significant digits then the number of NA's will
also be given to four decimal places.

A column that is a factor or an ordered factor will be summarized by a
(possibly truncated) frequency table.  Means, medians, etc. are not
meaningful for factors.
Three of the "secrets of the S masters" are:
  - indexing is particularly flexible and powerful in S
  - the "%in%" function is versatile and often overlooked
  - you can add a column to a data frame by assigning to that name
so three of these operations can be written as

 d[ -45, ]                     # delrow( dataframe d, index=45)
 d[ , !(names(d) %in% "name")] # delcol( dataframe d, "name")
 d[ , -col]                    # alternative form is you know the column number
 d$newcol = v                  # inscol( dataframe d, (col)vector v)
P.S. How many other people think that the next edition of MASS should
be renamed "Secrets of The S Masters"?   :-)
#
"ivo welch" <ivo.welch at yale.edu> said...
There might be a more natural way to do this using grid graphics, but I'm
still not familiar with grid.  This type of plot is one I do enough of
that I rolled by own the old-fashioned way.

Try

zz <- ts(rnorm(100))
DenTSplot(zz)

## ts and density
DenTSplot <- function(x, ylim=NULL,main=NULL,...) {
	# data sanity check
	if(!is.ts(x))
		x <- ts(x)
	if(!is.null(dim(x))) {
		stop("can only handle univariate time series\n")
	}

	# set layout - FIXME - should this be user-setable?
	layout(matrix(c(1,1,1,2),nrow=1))

	# find x density.  FIXME - need to take arguments about
	# bandwidth selector, etc.
	x.d <- density(x)

	if(is.null(ylim)) {
		ylim <- range(x.d$x)
	}
	if(is.null(main))
		main <- "Series"

	opar <- par(no.readonly=TRUE)
	on.exit(par(opar))
	mai <- par("mai")
	mai.ts <- c(mai[1:3],0)
	par(mai=mai.ts)
	plot(x,ylim=ylim,main=main,...)

	mai.den <- c(mai[1],0,mai[3:4])
	par(mai=mai.den)
	plot(x.d$y, x.d$x,
		ylim=ylim, type="l", yaxt="n",
		ylab="",xlab="",main="Density")
}