I have just committed (in r59883) some changes to the R parser based on Romain Francois' parser package. Packages that made use of parser will hopefully find that the information in base R gives them what they need to work with, but the data is not identical to what parser recorded (since it was not consistent with some things already in R). One reason for the change was that the parser in the parser package was slightly different than the one in R; the hope is that by providing the services in R, it will make maintenance easier for things like code analysis, pretty printing, etc. See ?getParseData for details, and if you are maintaining a package that depends on parser, feel free to ask me for help in the transition, or make suggestions for changes if I've done something that causes you too much trouble. Duncan Murdoch P.S. to Qiang Li: as mentioned privately, the goal for this change was to reproduce output equivalent to what parser did, so I have not incorporated your suggested change to outlaw expressions like "x[[1] ]" (with an embedded space where it shouldn't be). After things settle down we can consider that change and others.
Changes to parser in R-devel
5 messages · Duncan Murdoch, Yihui Xie
1 day later
I'm not sure if there is a bug somewhere; see this example:
getParseData(parse(text='function(x){}'))
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 8 1 11 FUNCTION TRUE function
2 1 9 1 9 2 11 '(' TRUE (
3 1 10 1 10 3 5 SYMBOL_FORMALS TRUE x
4 1 11 1 11 4 11 ')' TRUE )
5 1 12 1 12 6 8 '{' TRUE {
6 1 13 1 13 7 8 '}' TRUE }
7 1 12 1 12 5 11 '}' TRUE {
8 1 12 1 13 8 11 expr FALSE
9 1 1 1 13 11 0 expr FALSE
I get an additional { in the 7th row of the 'text' column.
Another problem is that for this empty function below, there will be
an obvious pause if you run it more than once:
getParseData(parse(text='function(){}'))
and you may get wild line/col numbers like this:
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 8 1 9 FUNCTION TRUE function
2 1 9 1 9 2 9 '(' TRUE (
3 1 10 1 10 3 9 ')' TRUE )
4 1 11 1 11 4 6 '{' TRUE {
5 1 12 1 12 5 6 '}' TRUE }
6 320024 11 140106360 11 11 9 '}' TRUE
7 1 11 1 12 6 9 expr FALSE
8 1 1 1 12 9 11 expr FALSE
What is worse is it can crash R:
*** caught segfault ***
address 0x9488c20, cause 'memory not mapped'
Traceback:
1: parse(text = "function(){}")
2: getSrcref(x)
3: getSrcfile(x)
4: getParseData(parse(text = "function(){}"))
sessionInfo()
R Under development (unstable) (2012-07-18 r59904) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Wed, Jul 18, 2012 at 2:31 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
I have just committed (in r59883) some changes to the R parser based on Romain Francois' parser package. Packages that made use of parser will hopefully find that the information in base R gives them what they need to work with, but the data is not identical to what parser recorded (since it was not consistent with some things already in R). One reason for the change was that the parser in the parser package was slightly different than the one in R; the hope is that by providing the services in R, it will make maintenance easier for things like code analysis, pretty printing, etc. See ?getParseData for details, and if you are maintaining a package that depends on parser, feel free to ask me for help in the transition, or make suggestions for changes if I've done something that causes you too much trouble. Duncan Murdoch P.S. to Qiang Li: as mentioned privately, the goal for this change was to reproduce output equivalent to what parser did, so I have not incorporated your suggested change to outlaw expressions like "x[[1] ]" (with an embedded space where it shouldn't be). After things settle down we can consider that change and others.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 12-07-19 4:41 PM, Yihui Xie wrote:
I'm not sure if there is a bug somewhere; see this example:
There's definitely a bug in the handling of empty lists, such as the empty list of commands in your first example and the empty list of arguments in your second. There's a partial workaround currently in R-devel, but not a perfect fix. (This is due to me missing a conversion from Romain's 0-based column counting to the usual 1-based counting.) I expect it will be fixed tomorrow, or sooner. Duncan Murdoch
getParseData(parse(text='function(x){}'))
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 8 1 11 FUNCTION TRUE function
2 1 9 1 9 2 11 '(' TRUE (
3 1 10 1 10 3 5 SYMBOL_FORMALS TRUE x
4 1 11 1 11 4 11 ')' TRUE )
5 1 12 1 12 6 8 '{' TRUE {
6 1 13 1 13 7 8 '}' TRUE }
7 1 12 1 12 5 11 '}' TRUE {
8 1 12 1 13 8 11 expr FALSE
9 1 1 1 13 11 0 expr FALSE
I get an additional { in the 7th row of the 'text' column.
Another problem is that for this empty function below, there will be
an obvious pause if you run it more than once:
getParseData(parse(text='function(){}'))
and you may get wild line/col numbers like this:
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 8 1 9 FUNCTION TRUE function
2 1 9 1 9 2 9 '(' TRUE (
3 1 10 1 10 3 9 ')' TRUE )
4 1 11 1 11 4 6 '{' TRUE {
5 1 12 1 12 5 6 '}' TRUE }
6 320024 11 140106360 11 11 9 '}' TRUE
7 1 11 1 12 6 9 expr FALSE
8 1 1 1 12 9 11 expr FALSE
What is worse is it can crash R:
*** caught segfault ***
address 0x9488c20, cause 'memory not mapped'
Traceback:
1: parse(text = "function(){}")
2: getSrcref(x)
3: getSrcfile(x)
4: getParseData(parse(text = "function(){}"))
sessionInfo()
R Under development (unstable) (2012-07-18 r59904) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Wed, Jul 18, 2012 at 2:31 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
I have just committed (in r59883) some changes to the R parser based on Romain Francois' parser package. Packages that made use of parser will hopefully find that the information in base R gives them what they need to work with, but the data is not identical to what parser recorded (since it was not consistent with some things already in R). One reason for the change was that the parser in the parser package was slightly different than the one in R; the hope is that by providing the services in R, it will make maintenance easier for things like code analysis, pretty printing, etc. See ?getParseData for details, and if you are maintaining a package that depends on parser, feel free to ask me for help in the transition, or make suggestions for changes if I've done something that causes you too much trouble. Duncan Murdoch P.S. to Qiang Li: as mentioned privately, the goal for this change was to reproduce output equivalent to what parser did, so I have not incorporated your suggested change to outlaw expressions like "x[[1] ]" (with an embedded space where it shouldn't be). After things settle down we can consider that change and others.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 19/07/2012 6:50 PM, Duncan Murdoch wrote:
On 12-07-19 4:41 PM, Yihui Xie wrote:
I'm not sure if there is a bug somewhere; see this example:
There's definitely a bug in the handling of empty lists, such as the empty list of commands in your first example and the empty list of arguments in your second. There's a partial workaround currently in R-devel, but not a perfect fix. (This is due to me missing a conversion from Romain's 0-based column counting to the usual 1-based counting.) I expect it will be fixed tomorrow, or sooner.
As far as I know, it is now fixed (in r59913). Duncan Murdoch
Duncan Murdoch
getParseData(parse(text='function(x){}'))
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 8 1 11 FUNCTION TRUE function
2 1 9 1 9 2 11 '(' TRUE (
3 1 10 1 10 3 5 SYMBOL_FORMALS TRUE x
4 1 11 1 11 4 11 ')' TRUE )
5 1 12 1 12 6 8 '{' TRUE {
6 1 13 1 13 7 8 '}' TRUE }
7 1 12 1 12 5 11 '}' TRUE {
8 1 12 1 13 8 11 expr FALSE
9 1 1 1 13 11 0 expr FALSE
I get an additional { in the 7th row of the 'text' column.
Another problem is that for this empty function below, there will be
an obvious pause if you run it more than once:
getParseData(parse(text='function(){}'))
and you may get wild line/col numbers like this:
line1 col1 line2 col2 id parent token terminal text
1 1 1 1 8 1 9 FUNCTION TRUE function
2 1 9 1 9 2 9 '(' TRUE (
3 1 10 1 10 3 9 ')' TRUE )
4 1 11 1 11 4 6 '{' TRUE {
5 1 12 1 12 5 6 '}' TRUE }
6 320024 11 140106360 11 11 9 '}' TRUE
7 1 11 1 12 6 9 expr FALSE
8 1 1 1 12 9 11 expr FALSE
What is worse is it can crash R:
*** caught segfault ***
address 0x9488c20, cause 'memory not mapped'
Traceback:
1: parse(text = "function(){}")
2: getSrcref(x)
3: getSrcfile(x)
4: getParseData(parse(text = "function(){}"))
sessionInfo()
R Under development (unstable) (2012-07-18 r59904) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Wed, Jul 18, 2012 at 2:31 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
I have just committed (in r59883) some changes to the R parser based on Romain Francois' parser package. Packages that made use of parser will hopefully find that the information in base R gives them what they need to work with, but the data is not identical to what parser recorded (since it was not consistent with some things already in R). One reason for the change was that the parser in the parser package was slightly different than the one in R; the hope is that by providing the services in R, it will make maintenance easier for things like code analysis, pretty printing, etc. See ?getParseData for details, and if you are maintaining a package that depends on parser, feel free to ask me for help in the transition, or make suggestions for changes if I've done something that causes you too much trouble. Duncan Murdoch P.S. to Qiang Li: as mentioned privately, the goal for this change was to reproduce output equivalent to what parser did, so I have not incorporated your suggested change to outlaw expressions like "x[[1] ]" (with an embedded space where it shouldn't be). After things settle down we can consider that change and others.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Great. I just tested it and did not find any more problems. Thanks! Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Fri, Jul 20, 2012 at 12:22 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
As far as I know, it is now fixed (in r59913). Duncan Murdoch