"tapply versus by" in function with more than 1 arguments
The first tapply in your question subsets V1 but not V2 so they are of different length. To subset both tapply over the row names and perform the subsetting in the function: tapply(rownames(dataf), dataf$class, function(r) cor(dataf[r, "V1"], dataf[r, "V2"])) or tapply(rownames(dataf), dataf$class, function(r) with(dataf[r, ], cor(V1, V2)))
On Wed, Oct 1, 2008 at 8:21 AM, C?zar Freitas <cafanselmo12 at yahoo.com.br> wrote:
Hi. I searched the list and didn't found nothing similar to this. I simplified my example like below:
#I need calculate correlation (for example) between 2 columns classified by a third one at a data.frame, like below:
#number of rows
nr = 10
#the third column is to enforce that I need correlation on two variables only
dataf = as.data.frame(matrix(c(rnorm(nr),rnorm(nr)*2,runif(nr),sort(c(1,1,2,2,3,3,sample(1:3,nr-6,replace=TRUE)))),ncol=4))
names(dataf)[4] = "class"
#> dataf
# V1 V2 V3 class
#1 0.56933020 1.2529931 0.30774422 1
#2 0.41702299 -1.6441547 0.76140046 1
#3 -1.07671647 -4.8747575 0.43706944 1
#4 -1.97701167 1.3015196 0.04390175 2
#5 0.56501325 1.8597720 0.08174124 2
#6 0.70068638 1.7922641 0.74730126 2
#7 -1.39956177 -1.9918904 0.64521918 3
#8 0.27086664 0.3745362 0.61026133 3
#9 0.04282347 3.7360407 0.48696109 3
#10 -0.34262654 0.7933674 0.09824913 3
#I tried:
tapply(dataf$V1, dataf$class, cor, dataf$V2)
#Error FUN(X[[1L]], ...) : incompatible dimensions
tapply(dataf$V1, dataf$class, cor, tapply(dataf$V2, dataf$class))
#Error FUN(X[[1L]], ...) : incompatible dimensions
#But using "by" I obtain:
by(dataf[,c("V1","V2")], dataf$class, cor)
#dataf$class: 1
# V1 V2
#V1 1.00000 0.91777
#V2 0.91777 1.00000
#--------------------------------------------------------------------------------------------------
#dataf$class: 2
# V1 V2
#V1 1.000000 0.987857
#V2 0.987857 1.000000
#--------------------------------------------------------------------------------------------------
#dataf$class: 3
# V1 V2
#V1 1.0000000 0.7318938
#V2 0.7318938 1.0000000
#My interest is on cor(V1,V2)[1,2], so I can take 0.91777, 0.987857 and 0.7318938, but I think that tapply can works better, if I can solve the problem.
Thanks,
Cezar
Novos endere?os, o Yahoo! que voc? conhece. Crie um email novo com a sua cara @ymail.com ou @rocketmail.com.
http://br.new.mail.yahoo.com/addresses
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.