Skip to content
Prev 289923 / 398498 Next

resampling for correlation and testing

On Mar 29, 2012, at 1:41 AM, ilai wrote:

            
My apologies for trying to make some example data. I should have done exactly what I was doing, which would have been harder to read. Since runif doesn't give exactly the same thing I'll give my functions that I'm using.

## note rperm was originally submitted to the stackexchange.com
## http://stats.stackexchange.com/questions/24300/how-to-resample-in-r-without-repeating-permutations
rperm <- function(m, size=2, replaceTF=TRUE) { # Obtain m unique permutations of 1:size
    # Function to obtain a new permutation.
    newperm <- function() {
        count <- 0                # Protects against infinite loops
        repeat {
            # Generate a permutation and check against previous ones.
            p <- sample(1:size, replace=replaceTF)
            hash.p <- paste(p, collapse="") # make a name for the list
            if (is.null(cache[[hash.p]])) break 
            ## stop if we have not already made this seq - repeat to make another new seq

            # Prepare to try again.
            count <- count+1
            if (count > 1000) {   # 1000 is arbitrary; adjust to taste
                p <- NA           # NA indicates a new permutation wasn't found
                hash.p <- ""
                break
            }
        }
        cache[[hash.p]] <<- TRUE  # Update the list of permutations found
        p                         # Return this (new) permutation
    }

    # Obtain m unique permutations.
    cache <- list()
    replicate(m, newperm())  
} 

library(parallel)
cl <- makeCluster(detectCores())
bootComb<-t(rperm(times, ncol(mat.B))) 
pValues.mat<-matrix(NA, nrow=nrow(mat.B), ncol=nrow(mat.B))
	for(i in 1:nrow(mat.B)){
		cat(round(i/nrow(mat.B),2)*100, "% \r")
		rho<-parApply(cl, bootComb, 1, function(idx, xmat, ymat){
			rho<-apply(ymat[,idx], 1, function(ymat, xmat){
				## parallel here due to memory sizes
				rho<-cor(xmat, ymat)
			}, xmat[idx]) ##gives back a vector of rho for n vs m
		}, mat.B[i, ], mat.B) ## returns a matrix of rho's for each combination
		
		rho.all<- cbind(rho, rho.A[i,])
		pValues.mat[i, ] <- parApply(cl, rho.all, 1, function(rhoVec){
			tl <- length(rhoVec) ## test index
			bl <- tl-1 ## dist index
			wilcox.test(rhoVec[1:bl], rhoVec[tl])$p.value
		})
	}
	stopCluster(cl)
No, cor(a)[i, j] != cor(b)[i , j] 

If your concern is because they are coming from the same distribution then again this is example data. Even then I would imagine that the correlation would be different for small n. Either way resampling the columns will give different  correlation values. Please try the code yourself.
Probably true ?. care to suggest a different statistical test ?
Actually I didn't get a vector or NaN's I got a matrix of pvalues. Which has proved useful. Now I want to make the function faster and was looking for a bit of help.