Hello. I have a large dataset with sales pr month for 56 products with 10 months and i have tried to see how the sales are correlated using cor() This has given me a 56X56 matrix with the R value for each product pair. Most of these correlations are insignificant, and i want only to retain the instances were the R value is significant (for 10 observations it should be above 0.64) Can someone help with this? -- View this message in context: http://r.789695.n4.nabble.com/Correlation-matrix-removing-insignificant-R-values-tp4099412p4099412.html Sent from the R help mailing list archive at Nabble.com.
Correlation matrix removing insignificant R values
4 messages · mgranlie, Frank E Harrell Jr, R. Michael Weylandt
I think it would be better to think of this as an estimation problem rather than a selection problem. If the correlation matrix is of interest, estimate the entire matrix. If you want to show that you can make decisions on the basis of the matrix, then use the bootstrap to get a confidence interval for quantities of interest. For example you can bootstrap the rank of the absolute values of the correlation coefficients to get nonparametric bootstrap percentile confidence limits for those ranks. You will be disappointed in the widths of these intervals, which demonstrate how hard it is to select winners and losers from non-huge datasets. For example, the bootstrap might show that for the apparent highest correlation you can only be 95% confident that that pair of variables does not possess one of the 10 worst correlations. Frank mgranlie wrote
Hello. I have a large dataset with sales pr month for 56 products with 10 months and i have tried to see how the sales are correlated using cor() This has given me a 56X56 matrix with the R value for each product pair. Most of these correlations are insignificant, and i want only to retain the instances were the R value is significant (for 10 observations it should be above 0.64) Can someone help with this?
----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Correlation-matrix-removing-insignificant-R-values-tp4099412p4099719.html Sent from the R help mailing list archive at Nabble.com.
There have been two threads dealing with this in the last few weeks:
please search the recent archives for those threads for a good
discussion -- end result: Josh Wiley provided a useful little function
to do so that I'll copy below. RSeek.org is a good place to do your
searching.
spec.cor <- function(dat, r, ...) {
x <- cor(dat, ...)
x[upper.tri(x, TRUE)] <- NA
i <- which(abs(x) >= r, arr.ind = TRUE)
data.frame(matrix(colnames(x)[as.vector(i)], ncol = 2), value = x[i])
}
Michael
On Wed, Nov 23, 2011 at 7:34 AM, mgranlie <mads at granlie.dk> wrote:
Hello. I have a large dataset with sales pr month for 56 products with 10 months and i have tried to see how the sales are correlated using cor() This has given me a 56X56 matrix with the R value for each product pair. Most of these correlations are insignificant, and i want only to retain the instances were the R value is significant (for 10 observations it should be above 0.64) Can someone help with this? -- View this message in context: http://r.789695.n4.nabble.com/Correlation-matrix-removing-insignificant-R-values-tp4099412p4099412.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Looking over the code below, I think this patched version might return
a better answer:
spec.cor <- function(dat, r, ...) {
x <- cor(dat, ...)
x[upper.tri(x, TRUE)] <- NA
i <- which(abs(x) >= r, arr.ind = TRUE)
data.frame(V1 = rownames(x)[i[,1]], V2 = colnames(x)[i[,2]], Value = x[i])
}
On Thu, Nov 24, 2011 at 12:03 AM, R. Michael Weylandt
<michael.weylandt at gmail.com> wrote:
There have been two threads dealing with this in the last few weeks:
please search the recent archives for those threads for a good
discussion -- end result: Josh Wiley provided a useful little function
to do so that I'll copy below. RSeek.org is a good place to do your
searching.
spec.cor <- function(dat, r, ...) {
? ?x <- cor(dat, ...)
? ?x[upper.tri(x, TRUE)] <- NA
? ?i <- which(abs(x) >= r, arr.ind = TRUE)
? ?data.frame(matrix(colnames(x)[as.vector(i)], ncol = 2), value = x[i])
}
Michael
On Wed, Nov 23, 2011 at 7:34 AM, mgranlie <mads at granlie.dk> wrote:
Hello. I have a large dataset with sales pr month for 56 products with 10 months and i have tried to see how the sales are correlated using cor() This has given me a 56X56 matrix with the R value for each product pair. Most of these correlations are insignificant, and i want only to retain the instances were the R value is significant (for 10 observations it should be above 0.64) Can someone help with this? -- View this message in context: http://r.789695.n4.nabble.com/Correlation-matrix-removing-insignificant-R-values-tp4099412p4099412.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.