Skip to content

R_alloc with more than 2GB (PR#7721)

5 messages · Wolfgang Huber, Brian Ripley

#
Full_Name: Wolfgang Huber
Version: R-devel_2005-03-10
OS: alphaev68-dec-osf4.0f
Submission from: (NULL) (62.253.128.15)


This report concerns allocation of large (>2^31 byte) chunks of memory with
R_alloc. I suspect it is a bug/typo but please don't hate me if it's actually a
feature:

In R, I can happily create large matrices:
[1] 191481   3063
[1] 4.369812

but when I call R_alloc in some of my C code, I get
"negative length vectors are not allowed":

    maxcp = 3063; n = 191481;
    vs = (long) maxcp * (long) n;
    mI = (double*) R_alloc(vs, sizeof(double));


I suspect this caused by allocString(int), which is called by R_alloc, see
below. Would it be possible to have allocString take a long argument as well?

These code excerpts are from R-devel_2005-03-10.tar.gz:

char *R_alloc(long nelem, int eltsize) {
 R_size_t size = nelem * eltsize;
 SEXP s = allocString(size);
 ...
}


SEXP allocString(int length) {
   return allocVector(CHARSXP, length);
}


SEXP allocVector(SEXPTYPE type, R_len_t length) {
...    
   case CHARSXP:
	size = BYTE2VEC(length + 1);
...
   malloc(sizeof(SEXPREC_ALIGN) + size * sizeof(VECREC)))
...
}
#
It is a feature.  Other parts of R expect a CHARSXP to have length less 
than or equal to 2^31 - 1.

Because of the use of Fortran, it is hard to see how to allow internal 
lengths (in elements, not necessarily bytes) to exceed that value.  We 
need to return to that, but it is not straightforward and last time we 
discussed it we agreed to defer it.

We can manage a better error message, but I am afraid nothing else in the 
near future.

Could you not use x = allocVector(REALSXP, vs) and REAL(x)[i]?  That will 
get you up to 2^31 - 1 elements, which is the R limit AFAIK.
On Thu, 10 Mar 2005 huber@ebi.ac.uk wrote:

            

  
    
#
Dear Prof Ripley,
OK, after looking closer at the code and comments in memory.c and 
Rinternals.h (typedef int R_len_t;) I realized that.
Thanks, that is an excellent idea. It should be fine for my immediate 
needs, and better what I've just been doing with Calloc!

 > Because of the use of Fortran, it is hard to see how to allow internal
 > lengths (in elements, not necessarily bytes) to exceed that value.  We
 > need to return to that, but it is not straightforward and last time we
 > discussed it we agreed to defer it.

 > We can manage a better error message, but I am afraid nothing else in
 > the near future.

In the application that triggered this posting, the memory is for a C 
array of doubles within a user-defined C function, not for anything that 
needs to become an R object, so maybe a suggestion would be to make 
R_alloc go directly to malloc without the detour over allocString or 
allocVector; or something along that line?

  Best regards
   Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber
#
Dear Prof Ripley,
OK, after looking closer at the code and comments in memory.c and 
Rinternals.h (typedef int R_len_t;) I realized that.
Thanks, that is an excellent idea. It should be fine for my immediate 
needs, and better what I've just been doing with Calloc!

 > Because of the use of Fortran, it is hard to see how to allow internal
 > lengths (in elements, not necessarily bytes) to exceed that value.  We
 > need to return to that, but it is not straightforward and last time we
 > discussed it we agreed to defer it.

 > We can manage a better error message, but I am afraid nothing else in
 > the near future.

In the application that triggered this posting, the memory is for a C 
array of doubles within a user-defined C function, not for anything that 
needs to become an R object, so maybe a suggestion would be to make 
R_alloc go directly to malloc without the detour over allocString or 
allocVector; or something along that line?

  Best regards
   Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber
#
On Thu, 10 Mar 2005, Wolfgang Huber wrote:

            
R_alloc makes use of garbage collection to avoid the need for explicit
free()ing.  Otherwise you might as well use Calloc.

Given that all memory allocated via the heap is aligned to doubles, there 
seems to me to be little or no loss in using a REALSXP rather than a 
CHARSXP, and certainly negligible loss for large vectors.  That will buy 
us a factor of 8 for the present.

Brian