Skip to content

Using Aggregate() with FUN arguments, which require more than one input variables

7 messages · Alexander Erbse, Uwe Ligges, Rui Barradas

#
Dear all,

I am trying to apply the aggregate() function to calculate correlations for
subsets of a dataframe. My argument x is supposed to consist of 2 numerical
vectors, which represent x and y for the cor() function. 

The following error results when calling the aggregate function: Error in
FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think
the subsets aggregate puts into cor() are sort of list types and therefore
can't be handled by cor().

Can anyone provide me with a solution?

Regards,
RNoob

--
View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4303936.html
Sent from the R help mailing list archive at Nabble.com.
#
On 17.01.2012 18:10, RNoob wrote:
as.matrix() will probably help, but since you have not specified your 
reproducible code, we cannot show how to change that.

Uwe Ligges
#
Hello,


RNoob wrote
I don't know if I'm understanding it well but it seems you're trying to
compute a correlation matrix for each group of a data.frame. The data.frame
is divided into groups by one or more factor columns.  If this is what you
want, try the function below. It doesn't use 'aggregate', it uses 'split'
and 'lapply'.


cor.groups <- function(x, vars){
	cols <- if(is.character(vars)) names(x) else 1:ncol(x)
	cols <- cols %in% vars
	cols <- cols | sapply(x, is.factor) | sapply(x, is.character)
	# transform logical to numeric index
	cols <- which(cols)

	lapply(split(x, x[, vars]), function(grp) cor(grp[, -cols]))
}

# Sample data
N <- 100
DF <- data.frame(U=as.factor(sample(LETTERS[1:3], N, T)),
			V=as.factor(sample(0:1, N, T)),
			W=sample(letters[1:6], N, T),
			x=1:N, y=sample(10, N, T), z=rnorm(N),
			stringsAsFactors=FALSE)

# And test it. Note the argument 'stringsAsFactors'

cor.groups(DF, "U")
cor.groups(DF, c("U", "V"))
cor.groups(DF, 1:3)
cor.groups(DF, c("U", "x"))     # look out, right result, wrong function
call


I hope it helps. (if not, be more explicit)

Rui Barradas


--
View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4304535.html
Sent from the R help mailing list archive at Nabble.com.
#
as.matrix()  will not help here. I will get the same error message.

And also, I don't need correlation matrices. I simply need a vector of
correlations. I will show you some code and data I am using.

Here you can see my main dataframe:
industry       date    testvar       retf1
1 Industrials 2004-12-31 1174382477 -0.02240908
2 Industrials 2005-01-31 1101039133  0.08080221
3 Industrials 2005-02-28 1211723486  0.05646877
4 Industrials 2005-03-31 1253428861 -0.05743186
5 Industrials 2005-04-30 1152846793 -0.02928415
6 Industrials 2005-05-31 1070386589  0.05865941

Now I want to take column "industry" or "date" or both - whatever - as
grouping columns for correlations between testvar and retf1 as follows:

*> numbers <- test[,c("testvar","retf1")]*
testvar       retf1
1 1174382477 -0.02240908
2 1101039133  0.08080221
3 1211723486  0.05646877
4 1253428861 -0.05743186
5 1152846793 -0.02928415
6 1070386589  0.05865941


*> groups <- test[,"industry"]*
[1] "Industrials" "Industrials" "Industrials" "Industrials" "Industrials"
[6] "Industrials"
[1] "Industrials"        "Telecommunications" "Financials"        
[4] "Utilities"          "ConsumerGoods"      "OilandGas"

*AND NOW:*

*> aggregate(numbers,by=list(groups),FUN="cor")*

Fehler in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'

So my desired output is a vector of correlations between subsets of x =
"testvar"  and y = "retf1". The length of the resulting vector will be
length(unique(groups)).

I think this draws a clearer picture to you. Sorry for not precisely
pointing it out in my first post.

Thanks and Regards!








--
View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4306048.html
Sent from the R help mailing list archive at Nabble.com.
#
as.matrix()  will not help here. I will get the same error message.

And also, I don't need correlation matrices. I simply need a vector of correlations. I will show you some code and data I am using.

Here you can see my main dataframe:
industry       date    testvar       retf1
1 Industrials 2004-12-31 1174382477 -0.02240908
2 Industrials 2005-01-31 1101039133  0.08080221
3 Industrials 2005-02-28 1211723486  0.05646877
4 Industrials 2005-03-31 1253428861 -0.05743186
5 Industrials 2005-04-30 1152846793 -0.02928415
6 Industrials 2005-05-31 1070386589  0.05865941

Now I want to take column "industry" or "date" or both - whatever - as grouping columns for correlations between testvar and retf1 as follows:
testvar       retf1
1 1174382477 -0.02240908
2 1101039133  0.08080221
3 1211723486  0.05646877
4 1253428861 -0.05743186
5 1152846793 -0.02928415
6 1070386589  0.05865941
[1] "Industrials" "Industrials" "Industrials" "Industrials" "Industrials"
[6] "Industrials"
[1] "Industrials"        "Telecommunications" "Financials"        
[4] "Utilities"          "ConsumerGoods"      "OilandGas"

AND NOW:
Fehler in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'

So my desired output is a vector of correlations between subsets of x = "testvar"  and y = "retf1". The length of the resulting vector will be length(unique(groups)).

I think this draws a clearer picture to you. Sorry for not precisely pointing it out in my first post.

Thanks and Regards!


-----Urspr?ngliche Nachricht-----
Von: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] 
Gesendet: Dienstag, 17. Januar 2012 19:21
An: Alexander Erbse
Cc: r-help at r-project.org
Betreff: Re: [R] Using Aggregate() with FUN arguments, which require more than one input variables
On 17.01.2012 18:10, RNoob wrote:
as.matrix() will probably help, but since you have not specified your reproducible code, we cannot show how to change that.

Uwe Ligges
#
On 18.01.2012 09:49, Alexander Erbse wrote:
sapply(split(numbers, groups), function(x) cor(x[,1], x[,2]))

Uwe Ligges