From: Peter Dalgaard <p.dalgaard at biostat.ku.dk>
To: "maneesh deshpande" <dmaneesh at hotmail.com>
CC: ramasamy at cancer.org.uk, r-help at stat.math.ethz.ch
Subject: Re: [R] Ranking within factor subgroups
Date: 23 Feb 2006 07:28:13 +0100
"maneesh deshpande" <dmaneesh at hotmail.com> writes:
Hi Adai,
I think your solution only works if the rows of the data frame are
by "date" and
the ordering function is the same used to order the levels of
factor(df$date) ?
It turns out (as I implied in my question) my data is indeed organized
this manner, so my
current problem is solved.
In the general case, I suppose, one could always order the data frame by
date before proceeding ?
Thanks,
Maneesh
You might prefer to look at split/unsplit/split<-, i.e. the z-scores
by group line:
z <- unsplit(lapply(split(x, g), scale), g)
with "scale" suitably replaced. Presumably (meaning: I didn't quite
read your code closely enough)
z <- unsplit(lapply(split(x, g), bucket, 10), g)
could do it.
From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
Reply-To: ramasamy at cancer.org.uk
To: maneesh deshpande <dmaneesh at hotmail.com>
CC: r-help at stat.math.ethz.ch
Subject: Re: [R] Ranking within factor subgroups
Date: Wed, 22 Feb 2006 03:44:45 +0000
It might help to give a simple reproducible example in the future. For
example
df <- cbind.data.frame( date=rep( 1:5, each=100 ), A=rpois(500, 100),
B=rpois(500, 50), C=rpois(500, 30) )
might generate something like
date A B C
1 1 93 51 32
2 1 95 51 30
3 1 102 59 28
4 1 105 52 32
5 1 105 53 26
6 1 99 59 37
... . ... .. ..
495 5 100 57 19
496 5 96 47 44
497 5 111 56 35
498 5 105 49 23
499 5 105 61 30
500 5 92 53 32
Here is my proposed solution. Can you double check with your existing
functions to see if they are correct.
decile.fn <- function(x, nbreaks=10){
br <- quantile( x, seq(0, 1, len=nbreaks+1), na.rm=T )
br[1] <- -Inf
return( cut(x, br, labels=F) )
}
out <- apply( df[ ,c("A", "B", "C")], 2,
function(v) unlist( tapply( v, df$date, decile.fn ) )
rownames(out) <- rownames(df)
out <- cbind(df$date, out)
Regards, Adai
On Tue, 2006-02-21 at 21:44 -0500, maneesh deshpande wrote:
Hi,
I have a dataframe, x of the following form:
Date Symbol A B C
20041201 ABC 10 12 15
20041201 DEF 9 5 4
...
20050101 ABC 5 3 1
20050101 GHM 12 4 2
....
here A, B,C are properties of a set symbols recorded for a given
I wante to decile the symbols For each date and property and
create another set of columns "bucketA","bucketB", "bucketC"
decile rank
for each symbol. The following non-vectorized code does what I want,
bucket <- function(data,nBuckets) {
q <- quantile(data,seq(0,1,len=nBuckets+1),na.rm=T)
q[1] <- q[1] - 0.1 # need to do this to ensure there are no
cut(data,q,include.lowest=T,labels=F)
}
calcDeciles <- function(x,colNames) {
nBuckets <- 10
dates <- unique(x$Date)
for ( date in dates) {
iVec <- x$Date == date
xx <- x[iVec,]
for (colName in colNames) {
data <- xx[,colName]
bColName <- paste("bucket",colName,sep="")
x[iVec,bColName] <- bucket(data,nBuckets)
}
}
x
}
x <- calcDeciles(x,c("A","B","C"))
I was wondering if it is possible to vectorize the above function to
more efficient.
I tried,
rlist <- tapply(x$A,x$Date,bucket)
but I am not sure how to assign the contents of "rlist" to their
slots in the original
dataframe.
Thanks,
Maneesh