hi chaps: a simple suggestion: R tells me who the contributors() are, but this should also tell me where I should mail suggestions to. Is it this mailing list? a repository of suggestions? an individual? this came up because i wanted to suggest two small enhancements: the first is for the summary() method for plain data frames. it would seem to me that the number of "NA"s should be printed as an integer, not necessarily in scientific notation. I have also yet to determine when summary() likes to give means and when it does not. (maybe it was an older version that sometimes did not give means). summary does not seem to have optional parameters to specify what statistics I would like. this could be useful, too. another small enhancement: there are four elementary data frame operations that bedevil novices, so they really should have named function wrappers: delrow( dataframe d, index=45); insrow( dataframe d, (row)vector v); delcol( dataframe d, "name"); inscol( dataframe d, (col)vector v); I looked at my R "bible" (venables&ripley), too, but here too it is not as clear as it needs to be. yes, this may be programmable, but it ain't as obvious as it should be for beginners. regards, /iaw
suggestion "suggestion" and dataframe operations
4 messages · ivo welch, Douglas Bates, Jason Turner
11 days later
hi chaps:
* I have some suggestion, the first of which is about suggestions, R
tells me who the contributors() are, but this should also tell me where
I should email suggestions to. Is it this mailing address/list? a
repository of suggestions? an individual?
this came up because i wanted to suggest enhancements:
* the first is for the summary() method for plain data frames. it would
seem to me that the number of "NA" observations should be printed as an
integer, not necessarily in scientific notation. I have also yet to
determine when summary() likes to give means and when it does not.
(maybe it was an older version that sometimes did not give means).
summary does not seem to have optional parameters to specify what
statistics I would like. this could be useful, too.
* another small enhancement: there are four elementary data frame
operations that bedevil novices, so they really should have named
function wrappers:
delrow( dataframe d, index=45);
insrow( dataframe d, (row)vector v);
delcol( dataframe d, "name");
inscol( dataframe d, (col)vector v);
Even a simple alias would do (maybe named row.delete, column.delete). I
looked at my R "bible" (venables&ripley), too, but here too it is not as
clear as it needs to be. yes, these operations are programmable, but it
ain't as obvious as it should be for beginners. these are elementary.
* Finally, a more complex question: I have a historical rate of stock
return series (yes, I teach finance). I would like to make a ts plot on
the left (plot(date,returns,type="h")), and a plot(density(returns)) on
the right. works nicely with par(mfrow=c(1,2)), but it would be even
nicer if I could rotate the density plot 90 degrees, so that it is more
apparent that the density plot is an aggregation of the points at the
same y coordinates. (if need be, a histogram could replace the density
plot.) Is it possible to rotate an entire subpanel figure. if there
was a "horizontal" parameter to ps.options for plot(), it would do the
trick, but this does not work. So, this may be a suggestion, too.
regards,
/iaw
ivo welch <ivo.welch at yale.edu> writes:
* the first is for the summary() method for plain data frames. it would seem to me that the number of "NA" observations should be printed as an integer, not necessarily in scientific notation. I have also yet to determine when summary() likes to give means and when it does not. (maybe it was an older version that sometimes did not give means). summary does not seem to have optional parameters to specify what statistics I would like. this could be useful, too.
The form of the output from summary depends on the mode or class of the column. A numeric column is summarized by a 'five-number' summary (min, first quartile, median, third quartile, maximum) and the mean. If there are NA's in the column the number of NA's is reported. The reason that it is sometimes reported to several decimal places is because all the values in that part of the summary are being printed in the same format. If the mean requires four decimal places to get the desired number of significant digits then the number of NA's will also be given to four decimal places. A column that is a factor or an ordered factor will be summarized by a (possibly truncated) frequency table. Means, medians, etc. are not meaningful for factors.
* another small enhancement: there are four elementary data frame
operations that bedevil novices, so they really should have named
function wrappers:
delrow( dataframe d, index=45);
insrow( dataframe d, (row)vector v);
delcol( dataframe d, "name");
inscol( dataframe d, (col)vector v);
Three of the "secrets of the S masters" are: - indexing is particularly flexible and powerful in S - the "%in%" function is versatile and often overlooked - you can add a column to a data frame by assigning to that name so three of these operations can be written as d[ -45, ] # delrow( dataframe d, index=45) d[ , !(names(d) %in% "name")] # delcol( dataframe d, "name") d[ , -col] # alternative form is you know the column number d$newcol = v # inscol( dataframe d, (col)vector v)
Even a simple alias would do (maybe named row.delete, column.delete). I looked at my R "bible" (venables&ripley), too, but here too it is not as clear as it needs to be. yes, these operations are programmable, but it ain't as obvious as it should be for beginners. these are elementary.
P.S. How many other people think that the next edition of MASS should be renamed "Secrets of The S Masters"? :-)
"ivo welch" <ivo.welch at yale.edu> said...
* Finally, a more complex question: I have a historical rate of stock return series (yes, I teach finance). I would like to make a ts plot on the left (plot(date,returns,type="h")), and a plot(density(returns)) on the right. works nicely with par(mfrow=c(1,2)), but it would be even nicer if I could rotate the density plot 90 degrees, so that it is more apparent that the density plot is an aggregation of the points at the same y coordinates. (if need be, a histogram could replace the density plot.) Is it possible to rotate an entire subpanel figure. if there was a "horizontal" parameter to ps.options for plot(), it would do the trick, but this does not work. So, this may be a suggestion, too.
There might be a more natural way to do this using grid graphics, but I'm
still not familiar with grid. This type of plot is one I do enough of
that I rolled by own the old-fashioned way.
Try
zz <- ts(rnorm(100))
DenTSplot(zz)
## ts and density
DenTSplot <- function(x, ylim=NULL,main=NULL,...) {
# data sanity check
if(!is.ts(x))
x <- ts(x)
if(!is.null(dim(x))) {
stop("can only handle univariate time series\n")
}
# set layout - FIXME - should this be user-setable?
layout(matrix(c(1,1,1,2),nrow=1))
# find x density. FIXME - need to take arguments about
# bandwidth selector, etc.
x.d <- density(x)
if(is.null(ylim)) {
ylim <- range(x.d$x)
}
if(is.null(main))
main <- "Series"
opar <- par(no.readonly=TRUE)
on.exit(par(opar))
mai <- par("mai")
mai.ts <- c(mai[1:3],0)
par(mai=mai.ts)
plot(x,ylim=ylim,main=main,...)
mai.den <- c(mai[1],0,mai[3:4])
par(mai=mai.den)
plot(x.d$y, x.d$x,
ylim=ylim, type="l", yaxt="n",
ylab="",xlab="",main="Density")
}