Skip to content

maximum matrix size

4 messages · Henrik Bengtsson, Peter Langfelder, Terry Therneau

#
I am now getting the occasional complaint about survival routines that are not able to 
handle big data.?? I looked in the manuals to try and update my understanding of max 
vector size, max matrix, max data set, etc; but it is either not there or I missed it (the 
latter more likely).?? Is it still .Machine$integer.max for everything??? Will that 
change??? Found where?

I am going to need to go through the survival package and put specific checks in front 
some or all of my .Call() statements, in order to give a sensible message whenever a 
bounday is struck.? A well meaning person just posted a suggested "bug fix" to the github 
source of one routine where my .C call allocates a scratch vector, suggesting? "resid = 
double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" message,? in 
the code below.?? A fix is obvously not quite that easy :-)

 ??? ??? resid <- .C(Ccoxscore, as.integer(n),
 ??? ??? ??? ??? as.integer(nvar),
 ??? ??? ??? ??? as.double(y),
 ??? ??? ??? ??? x=as.double(x),
 ??? ??? ??? ??? as.integer(newstrat),
 ??? ??? ??? ??? as.double(score),
 ??? ??? ??? ??? as.double(weights[ord]),
 ??? ??? ??? ??? as.integer(method=='efron'),
 ??? ??? ??? ??? resid= double(n*nvar),
 ??? ??? ??? ??? double(2*nvar))$resid

Terry T.
#
On Tue, Oct 2, 2018 at 9:43 AM Therneau, Terry M., Ph.D. via R-devel
<r-devel at r-project.org> wrote:
FWIW, this is the reference I've decided to follow for matrixStats:

"* For now, keep 2^31-1 limit on matrix rows and columns."

from Slide 5 in Luke Tierney's 'Some new developments for the R
engine', June 24, 2012
(http://homepage.stat.uiowa.edu/~luke/talks/purdue12.pdf).

/Henrik
#
Does this help a little?

https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Long-vectors

One thing I seem to remember but cannot find a reference for is that
long vectors can only be passed to .Call calls, not C/Fortran. I
remember rewriting .C() in my WGCNA package to .Call for this very
reason but perhaps the restriction has been removed.

Peter
On Tue, Oct 2, 2018 at 9:43 AM Therneau, Terry M., Ph.D. via R-devel
<r-devel at r-project.org> wrote:
#
That is indeed helpful; reading the sections around it largely answered my questions. 
Rinternals.h has the definitions

#define allocMatrix Rf_allocMatrix
SEXP Rf_allocMatrix(SEXPTYPE, int, int);
#define allocVector??? ??? Rf_allocVector
SEXP???? Rf_allocVector(SEXPTYPE, R_xlen_t);

Which answers the further question of what to expect inside C routines invoked by Call.

It looks like the internal C routines for coxph work on large matrices by pure serendipity 
(nrow and ncol each less than 2^31 but with the product? > 2^31), but residuals.coxph 
fails with an allocation error on the same data.? A slight change and it could just as 
easily have led to a hard crash. ?? Sigh...?? I'll need to do a complete code review.?? 
I've been converting .C routines to .Call? as convenient, this will force conversion of 
many of the rest as a side effect (20 done, 23 to go).? As a statsitician my overall 
response is "haven't they ever heard of sampling"?? But as I said earlier, it isn't just 
one user.

Terry T.
On 10/02/2018 12:22 PM, Peter Langfelder wrote: