Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Greg Snow
> Sent: Monday, February 12, 2007 2:34 PM
> To: andy1983; r-help at stat.math.ethz.ch
> Subject: Re: [R] processing a large matrix
>
> One approach is to split up the work of doing the
> correlations, if you give the 'cor' function 2 matricies then
> it gives you the correlations between all pairs of columns.
> Since you said it works fine with 10,000 columns but not
> 30,000 you could split into 3 pieces and do something like (untested):
>
> out <- rbind(
> cbind( cor(mymatrix[,1:10000])^2,
> cor(mymatrix[,1:10000], mymatrix[10001:20000])^2,
> cor(mymatrix[,1:10000], mymatrix[20001:30000])^2 ),
> cbind( matrix(NA,10000,10000),
> cor(mymatrix[,10001:20000])^2,
> cor(mymatrix[,20001:30000],mymatrix[,1:10000])^2),
> cbind( matrix(NA,10000,10000),
> matrix(NA,10000,10000),
> cor(mymatrix[,20001:30000])^2 )
> )
>
> out[ lower.tri(out) ] <- t(out)[ lower.tri(out) ]
>
> For breaking into 3 pieces, this is probably easier/quicker
> than trying to find and alternative. If you need to break it
> into even more pieces (doing blocks of 1,000 when there are
> 30,000 columns) then there are probably better alternatives
> (you could do a loop over blocks, that would be faster than
> the loop over individual columns).
>
> Hope this helps,
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at intermountainmail.org
> (801) 408-8111
>
>
>
> > -----Original Message-----
> > From: r-help-bounces at stat.math.ethz.ch
> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of andy1983
> > Sent: Monday, February 12, 2007 1:55 PM
> > To: r-help at stat.math.ethz.ch
> > Subject: [R] processing a large matrix
> >
> >
> > I would like to compare every column in my matrix with every other
> > column and get the r-squared.
> >
> > I tried using the following formula and looping through
> every column:
> > > summary(lm(matrix[,x]~matrix[,y]))$r.squared
> > If I have 10,000 columns, the loops (10,000 * 10,000) take forever
> > even if there is no formula inside.
> >
> > Then, I attempted to vectorize my code:
> > > cor(matrix)^2
> > With 10,000 columns, this works great. With 30,000, R tells me it
> > cannot allocate vector of that length even if the memory
> limit is set
> > to 4 GBs.
> >
> > Is there anything else I can do to resolve this issue?
> >
> > Thanks.
> > --
> > View this message in context:
> > http://www.nabble.com/processing-a-large-matrix-tf3216447.html
> #a8932591
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>