Sebastian and Andy -
Yes, Andy has read the question correctly. A similar task that
I do quite often is to subtract the mean of a class from all of
the members of the class, and do this within every column of a
(numeric) data frame. Kurt Hornik's function aggregate() is
the one to use. Here's an example using the same data set that
he uses in the example on the help page. (Only the commands are
shown here. You'll have to try them to see the output, because
I cannot cut and paste easily into my email.)
data(state)
ls()
# This data set puts individual columns into your workspace,
# rather than making a data frame of them.
example <- data.frame(state.abb, state.name, state.region, state.x77)
str(example)
means <- aggregate(example[ ,3+seq(8)], list(example[ ,3]), mean)
str(means)
residuals <- example[ ,3+seq(8)] - means[as.numeric(example[ ,3]), -1]
result <- cbind(example[ ,seq(3)], residuals)
str(result)
-- Ah, I think this example would be easier to read if I had used
the columns from the workspace directly, rather than packaging them
into a data frame 'example' first, the using numeric subscripts on
the data frame. But, at least this illustrates some common ways of
subscripting subsets of columns from a data frame.
Again, see help("aggregate"), help("Subscript") to see what I am
doing here.
- best - tom blackwell - u michigan medical school - ann arbor -
(Ah, I see that Andy has just replied this morning as well. I'll see
what his reply was as soon as I send off this one.)
On Tue, 17 Feb 2004, Sebastian Luque wrote:
Hi, This is exactly what I meant Andy, the stratifying variable can be thought of as a factor. However, I tried your code and I get the error: "Error in Ops.data.frame......- only defined for equally-sized data frames". What may be happening? The result of 'apply' functions, or 'split' or 'by' and the like give lists as results, with a names attribute that, in my case, would have the levels of the factor. How can one get the results back to a data.frame object, with the newly calculated variables? The indexing for lists is not as straight forward as for data frames. Thanks to both of you for showing me the power of indexing in R functions! Sebastian Liaw, Andy wrote:
I'm guessing what Sebatian want is to do the differencing by a stratifying variable such as ID; e.g., the data may look like: df <- as.data.frame(cbind(ID=rep(1:5, each=3), x=matrix(rnorm(45), 15, 3)) So using Tom's solution, one would do something like: mdiff <- function(x) x[-1,] - x[nrow(x),] sapply(split(df[,-1], df[,1]), mdiff) There could well be more efficient ways! Andy
From: Tom Blackwell
Sebastian -
For successive differences within a single column 'x'
differences <- c(NA, diff(x)),
same as
differences <- c(NA, x[-1] - x[-length(x)]).
See help("diff"), help("Subscript"). The second version also
works when x is a matrix or a data frame, except now the result
is a matrix or data frame of the same size.
x <- data.frame(matrix(rnorm(1e+5), 1e+4))
dim(x) # 10000 10
differences <- rbind(rep(NA, 10), x[-1, ] - x[-dim(x)[1], ])
dim(differences) # 10000 10
However, you write "I need to do this for all the subsets of data
created by the numbers in one of the columns of the data frame ..."
and I'm not sure I understand how an 'id' column would create many
subsets of the data. So the simple examples above may not answer
the question you are asking.
- tom blackwell - u michigan medical school - ann arbor -
On Tue, 17 Feb 2004, Sebastian Luque wrote:
Hi, In fact, I've been trying to get rid of loops in my code for more than a week now, but nothing I try seems to work. It sounds as if you have lots of experience with loops, so would appreciate any pointers you may have on the following. I want to create a column showing the difference between the ith row and i-1. Of course, the first row won't have any value in it, because there is nothing above it to subtract to. This is fairly easy to do with a simple loop, but I need to do this for all the subsets of data created by the numbers in one of the columns of the data frame (say, an id column). I would greatly appreciate any idea you may have on this. Thanks in advance. Best regards, Sebastian -- Sebastian Luque sluque at mun.ca
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ------------------------------------------------------------------------------
-- Sebastian Luque sluque at mun.ca Tel.: +1 (204) 586-8170