Dear all, I am trying to apply the aggregate() function to calculate correlations for subsets of a dataframe. My argument x is supposed to consist of 2 numerical vectors, which represent x and y for the cor() function. The following error results when calling the aggregate function: Error in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think the subsets aggregate puts into cor() are sort of list types and therefore can't be handled by cor(). Can anyone provide me with a solution? Regards, RNoob -- View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4303936.html Sent from the R help mailing list archive at Nabble.com.
Using Aggregate() with FUN arguments, which require more than one input variables
7 messages · Alexander Erbse, Uwe Ligges, Rui Barradas
On 17.01.2012 18:10, RNoob wrote:
Dear all, I am trying to apply the aggregate() function to calculate correlations for subsets of a dataframe. My argument x is supposed to consist of 2 numerical vectors, which represent x and y for the cor() function. The following error results when calling the aggregate function: Error in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think the subsets aggregate puts into cor() are sort of list types and therefore can't be handled by cor().
as.matrix() will probably help, but since you have not specified your reproducible code, we cannot show how to change that. Uwe Ligges
Can anyone provide me with a solution? Regards, RNoob -- View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4303936.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, RNoob wrote
Dear all, I am trying to apply the aggregate() function to calculate correlations for subsets of a dataframe. My argument x is supposed to consist of 2 numerical vectors, which represent x and y for the cor() function. The following error results when calling the aggregate function: Error in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think the subsets aggregate puts into cor() are sort of list types and therefore can't be handled by cor(). Can anyone provide me with a solution? Regards, RNoob
I don't know if I'm understanding it well but it seems you're trying to
compute a correlation matrix for each group of a data.frame. The data.frame
is divided into groups by one or more factor columns. If this is what you
want, try the function below. It doesn't use 'aggregate', it uses 'split'
and 'lapply'.
cor.groups <- function(x, vars){
cols <- if(is.character(vars)) names(x) else 1:ncol(x)
cols <- cols %in% vars
cols <- cols | sapply(x, is.factor) | sapply(x, is.character)
# transform logical to numeric index
cols <- which(cols)
lapply(split(x, x[, vars]), function(grp) cor(grp[, -cols]))
}
# Sample data
N <- 100
DF <- data.frame(U=as.factor(sample(LETTERS[1:3], N, T)),
V=as.factor(sample(0:1, N, T)),
W=sample(letters[1:6], N, T),
x=1:N, y=sample(10, N, T), z=rnorm(N),
stringsAsFactors=FALSE)
# And test it. Note the argument 'stringsAsFactors'
cor.groups(DF, "U")
cor.groups(DF, c("U", "V"))
cor.groups(DF, 1:3)
cor.groups(DF, c("U", "x")) # look out, right result, wrong function
call
I hope it helps. (if not, be more explicit)
Rui Barradas
--
View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4304535.html
Sent from the R help mailing list archive at Nabble.com.
as.matrix() will not help here. I will get the same error message. And also, I don't need correlation matrices. I simply need a vector of correlations. I will show you some code and data I am using. Here you can see my main dataframe:
head(test)
industry date testvar retf1
1 Industrials 2004-12-31 1174382477 -0.02240908
2 Industrials 2005-01-31 1101039133 0.08080221
3 Industrials 2005-02-28 1211723486 0.05646877
4 Industrials 2005-03-31 1253428861 -0.05743186
5 Industrials 2005-04-30 1152846793 -0.02928415
6 Industrials 2005-05-31 1070386589 0.05865941
Now I want to take column "industry" or "date" or both - whatever - as
grouping columns for correlations between testvar and retf1 as follows:
*> numbers <- test[,c("testvar","retf1")]*
head(numbers)
testvar retf1 1 1174382477 -0.02240908 2 1101039133 0.08080221 3 1211723486 0.05646877 4 1253428861 -0.05743186 5 1152846793 -0.02928415 6 1070386589 0.05865941 *> groups <- test[,"industry"]*
head(groups)
[1] "Industrials" "Industrials" "Industrials" "Industrials" "Industrials" [6] "Industrials"
head(unique(groups))
[1] "Industrials" "Telecommunications" "Financials" [4] "Utilities" "ConsumerGoods" "OilandGas" *AND NOW:* *> aggregate(numbers,by=list(groups),FUN="cor")* Fehler in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x' So my desired output is a vector of correlations between subsets of x = "testvar" and y = "retf1". The length of the resulting vector will be length(unique(groups)). I think this draws a clearer picture to you. Sorry for not precisely pointing it out in my first post. Thanks and Regards! -- View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4306048.html Sent from the R help mailing list archive at Nabble.com.
as.matrix() will not help here. I will get the same error message. And also, I don't need correlation matrices. I simply need a vector of correlations. I will show you some code and data I am using. Here you can see my main dataframe:
head(test)
industry date testvar retf1 1 Industrials 2004-12-31 1174382477 -0.02240908 2 Industrials 2005-01-31 1101039133 0.08080221 3 Industrials 2005-02-28 1211723486 0.05646877 4 Industrials 2005-03-31 1253428861 -0.05743186 5 Industrials 2005-04-30 1152846793 -0.02928415 6 Industrials 2005-05-31 1070386589 0.05865941 Now I want to take column "industry" or "date" or both - whatever - as grouping columns for correlations between testvar and retf1 as follows:
numbers <- test[,c("testvar","retf1")]
head(numbers)
testvar retf1 1 1174382477 -0.02240908 2 1101039133 0.08080221 3 1211723486 0.05646877 4 1253428861 -0.05743186 5 1152846793 -0.02928415 6 1070386589 0.05865941
groups <- test[,"industry"]
head(groups)
[1] "Industrials" "Industrials" "Industrials" "Industrials" "Industrials" [6] "Industrials"
head(unique(groups))
[1] "Industrials" "Telecommunications" "Financials" [4] "Utilities" "ConsumerGoods" "OilandGas" AND NOW:
aggregate(numbers,by=list(groups),FUN="cor")
Fehler in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x' So my desired output is a vector of correlations between subsets of x = "testvar" and y = "retf1". The length of the resulting vector will be length(unique(groups)). I think this draws a clearer picture to you. Sorry for not precisely pointing it out in my first post. Thanks and Regards! -----Urspr?ngliche Nachricht----- Von: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] Gesendet: Dienstag, 17. Januar 2012 19:21 An: Alexander Erbse Cc: r-help at r-project.org Betreff: Re: [R] Using Aggregate() with FUN arguments, which require more than one input variables
On 17.01.2012 18:10, RNoob wrote:
Dear all, I am trying to apply the aggregate() function to calculate correlations for subsets of a dataframe. My argument x is supposed to consist of 2 numerical vectors, which represent x and y for the cor() function. The following error results when calling the aggregate function: Error in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think the subsets aggregate puts into cor() are sort of list types and therefore can't be handled by cor().
as.matrix() will probably help, but since you have not specified your reproducible code, we cannot show how to change that. Uwe Ligges
Can anyone provide me with a solution? Regards, RNoob -- View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which -require-more-than-one-input-variables-tp4303936p4303936.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 18.01.2012 09:49, Alexander Erbse wrote:
as.matrix() will not help here. I will get the same error message. And also, I don't need correlation matrices. I simply need a vector of correlations. I will show you some code and data I am using. Here you can see my main dataframe:
head(test)
industry date testvar retf1
1 Industrials 2004-12-31 1174382477 -0.02240908
2 Industrials 2005-01-31 1101039133 0.08080221
3 Industrials 2005-02-28 1211723486 0.05646877
4 Industrials 2005-03-31 1253428861 -0.05743186
5 Industrials 2005-04-30 1152846793 -0.02928415
6 Industrials 2005-05-31 1070386589 0.05865941
Now I want to take column "industry" or "date" or both - whatever - as grouping columns for correlations between testvar and retf1 as follows:
numbers<- test[,c("testvar","retf1")]
head(numbers)
testvar retf1 1 1174382477 -0.02240908 2 1101039133 0.08080221 3 1211723486 0.05646877 4 1253428861 -0.05743186 5 1152846793 -0.02928415 6 1070386589 0.05865941
groups<- test[,"industry"]
head(groups)
[1] "Industrials" "Industrials" "Industrials" "Industrials" "Industrials" [6] "Industrials"
head(unique(groups))
[1] "Industrials" "Telecommunications" "Financials" [4] "Utilities" "ConsumerGoods" "OilandGas" AND NOW:
aggregate(numbers,by=list(groups),FUN="cor")
Fehler in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x' So my desired output is a vector of correlations between subsets of x = "testvar" and y = "retf1". The length of the resulting vector will be length(unique(groups)). I think this draws a clearer picture to you. Sorry for not precisely pointing it out in my first post. Thanks and Regards!
sapply(split(numbers, groups), function(x) cor(x[,1], x[,2])) Uwe Ligges
-----Urspr?ngliche Nachricht----- Von: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] Gesendet: Dienstag, 17. Januar 2012 19:21 An: Alexander Erbse Cc: r-help at r-project.org Betreff: Re: [R] Using Aggregate() with FUN arguments, which require more than one input variables On 17.01.2012 18:10, RNoob wrote:
Dear all, I am trying to apply the aggregate() function to calculate correlations for subsets of a dataframe. My argument x is supposed to consist of 2 numerical vectors, which represent x and y for the cor() function. The following error results when calling the aggregate function: Error in FUN(X[[1L]], ...) : supply both 'x' and 'y' or a matrix-like 'x'. I think the subsets aggregate puts into cor() are sort of list types and therefore can't be handled by cor().
as.matrix() will probably help, but since you have not specified your reproducible code, we cannot show how to change that. Uwe Ligges
Can anyone provide me with a solution? Regards, RNoob -- View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which -require-more-than-one-input-variables-tp4303936p4303936.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Great! Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Using-Aggregate-with-FUN-arguments-which-require-more-than-one-input-variables-tp4303936p4306170.html Sent from the R help mailing list archive at Nabble.com.