Skip to content
Prev 199053 / 398503 Next

R 2.10.0: Error in gsub/calloc

Here is a more self-contained way to reproduce the problem in 2.10.0
using the prebuilt Windows executable.  Putting a trace on gsub in
the call to strapply showed that it died in the first call to gsub
when the replacement included "\\1" and the string was about 900000
characters long (and included 150000 "words").  It looks like it
dies if the string is >= 731248 characters.
[1] 731248
[1] " abcde abcd"
Error in gsub("([[:alpha:]]+)", "\\1", d, perl = FALSE) : 
  Calloc could not allocate (-2146542248 of 1) memory
In addition: Warning messages:
1: In gsub("([[:alpha:]]+)", "\\1", d, perl = FALSE) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In gsub("([[:alpha:]]+)", "\\1", d, perl = FALSE) :
  Reached total allocation of 1535Mb: see help(memory.size)
Error in gsub("([[:alpha:]]+)", "\\1", d, perl = TRUE) : 
  Calloc could not allocate (-2146542248 of 1) memory
In addition: Warning messages:
1: In gsub("([[:alpha:]]+)", "\\1", d, perl = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In gsub("([[:alpha:]]+)", "\\1", d, perl = TRUE) :
  Reached total allocation of 1535Mb: see help(memory.size)

Make d one character shorter and it succeeds with either
perl=TRUE or perl=FALSE.
_                            
platform       i386-pc-mingw32              
arch           i386                         
os             mingw32                      
system         i386, mingw32                
status                                      
major          2                            
minor          10.0                         
year           2009                         
month          10                           
day            26                           
svn rev        50208                        
language       R                            
version.string R version 2.10.0 (2009-10-26)
R version 2.10.0 (2009-10-26) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tcltk_2.10.0

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com