Skip to content

Crash report: regexpr("a{2-}", "")

8 messages · Henrik Bengtsson, David Winsemius, Bill Venables +2 more

#
Each of the following calls crash ("core dumps") R (R --vanilla) on
various versions and OSes:

regexpr("a{2-}", "")
sub("a{2-}", "")
gsub("a{2-}", "")


EXAMPLES:
R version 2.11.1 Patched (2010-09-16 r52949)
Platform: i386-pc-mingw32 (32-bit)
...
Assertion failed: iter->max == -1 || iter->max == 1, file
tre-compile.c, line 1825
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
R version 2.12.0 Under development (unstable) (2010-09-14 r52910)
Platform: i386-pc-mingw32/i386 (32-bit)
...
Assertion failed: iter->max == -1 || iter->max == 1, file
tre-compile.c, line 1825
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
R version 2.11.0 Patched (2010-05-09 r51960)
x86_64-unknown-linux-gnu
...
R: tre-compile.c:1825: tre_ast_to_tnfa: Assertion `iter->max == -1 ||
iter->max == 1' failed.
Aborted


/Henrik
#
On Sep 21, 2010, at 11:04 PM, Henrik Bengtsson wrote:

            
Not a problem in reasonably current Mac with 64bit GUI:
 > regexpr("a{2-}", "")
[1] 1
attr(,"match.length")
[1] 0
 > sub("a{2-}", "")
Error in is.character(x) : 'x' is missing
 > gsub("a{2-}", "")
Error in is.character(x) : 'x' is missing

R version 2.11.1 Patched (2010-08-26 r52822)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
[R.app GUI 1.35 (5612) x86_64-apple-darwin9.8.0]
#
David's post made me realize that I got the sub()/gsub() lines wrong.
It should be:

regexpr("a{2-}", "")
sub("a{2-}", "", "")
gsub("a{2-}", "", "")

Either way, the crash is there, at on least Windows and Linux.

/Henrik
On Tue, Sep 21, 2010 at 8:43 PM, David Winsemius <dwinsemius at comcast.net> wrote:
#
On Sep 22, 2010, at 12:12 AM, Henrik Bengtsson wrote:

            
Still no crash on a Mac. Did you mean to include a third argument to  
regexpr() as you did for sub and gsub?
#
On Tue, Sep 21, 2010 at 9:20 PM, David Winsemius <dwinsemius at comcast.net> wrote:
No, it was only that sub()/gsub() needs at least three arguments while
regexpr() only needs two, and did a simple cut'n'paste when I wrote
the original message.  All three commands probably use the same
underlying regular expression library, so it is likely it is same bug.

It looks like at least on your OS/R version it does not crash.

/Henrik
#
I have recently become aware of some curious behaviour of median() which I think could be usefully corrected.  I am sure this must have come up before, but I'm raising it again.

The phenomenon is best shown by a simple example.
[,1]       [,2]       [,3]      [,4]
[1,] 0.1388592 0.08478220 0.02012404 0.7733054
[2,] 0.1718332 0.06370432 0.66167219 0.2521809
[3,] 0.3190116 0.08616569 0.23107320 0.6278422
[4,] 0.9185233 0.29218144 0.99193823 0.6306847
[1] 0.1118207 0.2120070 0.2750424 0.7746040

So far, so good. But what happens when you turn it into a data frame?
[1] 0.1118207 0.2120070 0.2750424 0.7746040

No problem there, yet.  But if you just look at one row:
[1] 0.0847822 0.1388592

without warning you get a vector of size two as the result, viz the two values which enclose the middle.  I thought this was simply because one row of a data frame is a list, but that can't be the whole story.  e.g.
[1] 0.2454224
Error in sort.list(x, partial = half + 0L:1L) : 
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
(Well yes, Brian, I did...)  

The function mean() has a nice property when you call it on a data frame, e.g.
X1        X2        X3        X4 
0.3870568 0.1317084 0.4762019 0.5710033 

and just to complicate the issue even further,
X1         X2         X3         X4 
0.13885916 0.08478220 0.02012404 0.77330535 

On the other hand, median(), whose behaviour should be similar I would suggest, just fails when handed a data frame argument.
[1] NA NA
Warning messages:
1: In mean.default(X[[1L]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(X[[2L]], ...) :
  argument is not numeric or logical: returning NA
_________________

I suggest that there should be some consistency here, and I suggest that median() be given a data.frame method that would allow it to respond much the same as mean() does.  The way it responds to data frame arguments now is quirky, at best.

Currently median() though generic, has only the default method.
[1] mean.data.frame mean.Date       mean.default    mean.difftime   mean.POSIXct   
[6] mean.POSIXlt
[1] median.default
Perhaps quantile() should also have a data.frame method for the same reason.  To me it seems curious, too, that quantile has a POSIXt method (in the stats package) whereas median currently does not.  (mean.POSIX*t are in the base package.)
[1] quantile.default quantile.POSIXt*

   Non-visible functions are asterisked
How do people respond to this?

(I see there have been hints of this in the past, see http://tolstoy.newcastle.edu.au/R/e2/help/06/12/7692.html
but I could only find hints.)

Bill Venables
CSIRO/CMIS, Cleveland Labs.
#
[Accidentally posted this to r-help instead of r-devel; reposting to put 
it into the correct list and thread. My apologies for the duplication.]
On 9/21/2010 8:04 PM, Henrik Bengtsson wrote:
To add another (windows) example it also crashes the 2.12.0 alpha build:

 > sessionInfo()
R version 2.12.0 alpha (2010-09-20 r52948)
Platform: i386-pc-mingw32/i386 (32-bit)
...
 > regexpr("a{2-}", "")
Assertion failed: iter->max == -1 || iter->max == 1, file tre-compile.c,
line 1825

This application has requested the Runtime to terminate it in an unusual 
way.
Please contact the application's support team for more information.

  
    
#
It crashes R on my linux:
 > regexpr("a{2-}", "")
R: tre-compile.c:1825: tre_ast_to_tnfa: Assertion `iter->max == -1 || 
iter->max == 1' failed.
Aborted

My setup is:

 > sessionInfo()
R version 2.11.1 (2010-05-31)
i386-redhat-linux-gnu

locale:
  [1] LC_CTYPE=en_NZ       LC_NUMERIC=C         LC_TIME=en_NZ
  [4] LC_COLLATE=en_NZ     LC_MONETARY=C        LC_MESSAGES=en_NZ
  [7] LC_PAPER=en_NZ       LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_NZ LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] djsmisc_1.0-1


David Scott
On 23/09/10 04:37, Brian Diggs wrote: