Bioc developers, Tomas Kalibera has developed an offline bug-finding tool written specifically to detect C PROTECT errors. The tool detects errors in the following Bioconductor packages: affyio affyPLM Biostrings BufferedMatrix chopsticks CNEr crlmm csaw DECIPHER DiffBind diffHic DirichletMultinomial EBImage edgeR fabia graph GraphAlignment HIBAG immunoClust InteractionSet IRanges metahdep mgsa MotIV mzR ncdfFlow PICS PING qpgraph QuasR R453Plus1Toolbox rqubic Rsamtools rSFFreader rtracklayer S4Vectors SANTA scran SeqArray SIMLR SNPRelate snpStats STAN TargetSearch ternarynet TitanCNA vsn xcms XVector Additional information and the reports are available at https://github.com/kalibera/rprotect Each package has its own subdirectory with textual reports and with DESCRIPTION file with the version of the package the tool has been run on. Please have a look at reports for your package(s) and fix if necessary. If you wish to re-run the bug-finding tool on your package after fixing, please refer to https://github.com/kalibera/rchk where installation instructions are available, including a script to automatically install the tool into a virtual machine. Still, please be advised that the automated installation may take long (about 1 hour on a decent machine). The tool is not perfect, so assess each report carefully. 'My' packages (Rsamtools, DirichletMultinomial) had several false positives (all associated with use of an attribute of a protected SEXP), one subtle problem (a symbol from a PROTECT'ed package name space; the symbol could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug tag = NEW_CHARACTER(n); for (int j = 0; j < n; ++j) SET_STRING_ELT(tag, j, NA_STRING); if ('A' == aux[0]) { buf_A = R_alloc(2, sizeof(char)); # <<- bug buf_A[1] = '\0'; } ... SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late! Have fun, and safe coding! Martin This email message may contain legally privileged and/or...{{dropped:2}}
[Bioc-devel] PROTECT errors in Bioconductor packages
15 messages · Michael Lawrence, Hervé Pagès, Luke Tierney +3 more
The tool is not perfect, so assess each report carefully.
I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!
I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the R_alloc call looks okay to me... Cheers, Aaron
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report carefully.
I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!
I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the R_alloc call looks okay to me...
yes, tag needs protection. I attributed the bug to R_alloc because I failed to reason that R_alloc (obviously) allocates and hence can trigger a garbage collection. Somehow it reflects my approach to PROTECTion, probably not shared by everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the expense of exposing me to bugs like this. Martin
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan
<martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report carefully.
I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!
I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the R_alloc call looks okay to me...
yes, tag needs protection. I attributed the bug to R_alloc because I failed to reason that R_alloc (obviously) allocates and hence can trigger a garbage collection. Somehow it reflects my approach to PROTECTion, probably not shared by everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the expense of exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity and logical
complexity. You have to think pretty hard to minimize use of the
protect stack.
One recommendation might be to UNPROTECT() as soon as the pointer on
the top is unneeded, rather than trying to figure out the number to
pop just before returning to R.
One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
Btw, I think my package RGtk2 got the record: 1952 errors. Luckily
almost all of them happened inside a few macros and autogenerated
code.
Martin
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100% sure.
I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!
I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the R_alloc call looks okay to me...
yes, tag needs protection. I attributed the bug to R_alloc because I failed to reason that R_alloc (obviously) allocates and hence can trigger a garbage collection. Somehow it reflects my approach to PROTECTion, probably not shared by everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the expense of exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity and logical complexity. You have to think pretty hard to minimize use of the protect stack.
I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code between the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed. So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the pointer on the top is unneeded, rather than trying to figure out the number to pop just before returning to R.
If you PROTECT() in a loop, you definitely want to do that. Otherwise, does it make a big difference?
One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of R_setAttrib(): https://stat.ethz.ch/pipermail/r-devel/2008-January/048040.html H.
Btw, I think my package RGtk2 got the record: 1952 errors. Luckily almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100% sure.
If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate. Best, luke
I also get a warning on almost every C++ function I've written, because
I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol could
in theory be an active binding and the value obtained not PROTECTed by
the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too late!
I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the R_alloc call looks okay to me...
yes, tag needs protection. I attributed the bug to R_alloc because I failed to reason that R_alloc (obviously) allocates and hence can trigger a garbage collection. Somehow it reflects my approach to PROTECTion, probably not shared by everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the expense of exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity and logical complexity. You have to think pretty hard to minimize use of the protect stack.
I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code between the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed. So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the pointer on the top is unneeded, rather than trying to figure out the number to pop just before returning to R.
If you PROTECT() in a loop, you definitely want to do that. Otherwise, does it make a big difference?
One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of R_setAttrib(): https://stat.ethz.ch/pipermail/r-devel/2008-January/048040.html H.
Btw, I think my package RGtk2 got the record: 1952 errors. Luckily almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100% sure.
If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing? Thanks, H.
Best, luke
I also get a warning on almost every C++ function I've written,
because
I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of control to the
catch block. I find that R itself is pretty good at complaining about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all
associated with use of an attribute of a protected SEXP), one subtle
problem (a symbol from a PROTECT'ed package name space; the symbol
could
in theory be an active binding and the value obtained not
PROTECTed by
the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too
late!
I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the R_alloc call looks okay to me...
yes, tag needs protection. I attributed the bug to R_alloc because I failed to reason that R_alloc (obviously) allocates and hence can trigger a garbage collection. Somehow it reflects my approach to PROTECTion, probably not shared by everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the expense of exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity and logical complexity. You have to think pretty hard to minimize use of the protect stack.
I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code between the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed. So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the pointer on the top is unneeded, rather than trying to figure out the number to pop just before returning to R.
If you PROTECT() in a loop, you definitely want to do that. Otherwise, does it make a big difference?
One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of R_setAttrib(): https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e= H.
Btw, I think my package RGtk2 got the record: 1952 errors. Luckily almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain legally privileged
and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100% sure.
If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. luke
Thanks, H.
Best, luke
I also get a warning on almost every C++ function I've written,
because
I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of control to
the
catch block. I find that R itself is pretty good at complaining
about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives
(all
associated with use of an attribute of a protected SEXP), one
subtle
problem (a symbol from a PROTECT'ed package name space; the
symbol
could
in theory be an active binding and the value obtained not
PROTECTed by
the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag, too
late!
I assume the bug refers to the un-PROTECT'd NEW_CHARACTER here - the R_alloc call looks okay to me...
yes, tag needs protection. I attributed the bug to R_alloc because I failed to reason that R_alloc (obviously) allocates and hence can trigger a garbage collection. Somehow it reflects my approach to PROTECTion, probably not shared by everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the expense of exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity and logical complexity. You have to think pretty hard to minimize use of the protect stack.
I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code between the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed. So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the pointer on the top is unneeded, rather than trying to figure out the number to pop just before returning to R.
If you PROTECT() in a loop, you definitely want to do that. Otherwise, does it make a big difference?
One thing that got me is that the order in which C evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of R_setAttrib(): https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e= H.
Btw, I think my package RGtk2 got the record: 1952 errors. Luckily almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers, Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain legally privileged
and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
On Fri, Apr 7, 2017 at 12:46 PM, <luke-tierney at uiowa.edu> wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote: On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report
carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100%
sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.
Indeed. I was wondering whether to bring this up here. In a (hopefully near future) version of R-devel, doing, e.g., INTEGER(x) could allocate. There is a way to ask it to give you NULL instead of allocating if it would need to, but the point being it's probably going to get much harder to safely be clever about avoiding PROTECT'ing. (Luke put in temporary suspension of GC in some places, but I don't recall the exact details of where that was used). As a side note to the above, you'll need to do INTEGER(x) less often than you did before. There will be a new INTEGER_ELT and INTEGER_GET_REGION macros which (I think) will be guaranteed to not cause SEXP allocation. In terms of why, at least in the ALTREP case, it's so that these things can be passed directly to the R internals and be treated like whatever lowl-level type of thing they are (e.g. numeric vector, string vector, list, etc). This seamless backwards compatiblity requires that code which doesn't use the INTEGER_ELT and INTEGER_GET_REGION (or analogues) macros needs to have INTEGER(X) work and give it the pointer it expects, which won't necessarily exist before the first time it is required. Best, ~G
luke
Thanks, H.
Best, luke
I also get a warning on almost every C++
function I've written,
because I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of
control to > > > > the
catch block. I find that R itself is pretty good at complaining about stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false positives (all associated with use of an attribute of a protected SEXP), one subtle problem (a symbol from a PROTECT'ed package name space; the >
symbol
could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); # <<-
bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT tag,
too
late!
I assume the bug refers to the un-PROTECT'd
NEW_CHARACTER here - > > > > the
R_alloc call looks okay to me...
yes, tag needs protection.
I attributed the bug to R_alloc because I failed to reason
that > > > R_alloc
(obviously) allocates and hence can trigger a garbage collection.
Somehow it reflects my approach to PROTECTion, probably not
shared > > > by
everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the
expense > > > of
exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity and
logical
complexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code
between
the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.
So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the
pointer on
the top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.
Otherwise,
does it make a big difference?
One thing that got me is that the order in which C evaluates
function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of
R_setAttrib():
3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040. html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_ wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJO PdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
H.
Btw, I think my package RGtk2 got the record: 1952 errors.
Luckily
almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers,
Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su 20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BS jFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain
legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su 20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BS jFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________
Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.et
hz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt 84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su 20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BS jFuidxFiBTx43J7iEvZG4G0_0uU&e=
-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Gabriel Becker, Ph.D Associate Scientist Bioinformatics and Computational Biology Genentech Research [[alternative HTML version deleted]]
On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report
carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100%
sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. luke
I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.
#include <R.h>
#include <Rinternals.h>
SEXP out(SEXP x, SEXP y)
{
int nx = length(x), ny = length(y);
SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
rans[i + nx*j] = tmp * ry[j];
}
SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(3);
return ans;
}
There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.
Anyway, getting back to the topic of this thread; if we were to pretend
that getAttrib() allocates in the above example, would that mean that
both getAttrib() calls now need to be PROTECTed by the developer? Or is
this handled somewhere internally?
Thanks, H.
Best, luke
I also get a warning on almost every C++
function I've written,
because I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of
control to > > > > the
catch block. I find that R itself is pretty good at
complaining > > > > about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false
positives > > > > > (all
associated with use of an attribute of a protected SEXP),
one > > > > > subtle
problem (a symbol from a PROTECT'ed package name space;
the > > > > > symbol
could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); #
<<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT
tag, too
late!
I assume the bug refers to the un-PROTECT'd
NEW_CHARACTER here - > > > > the
R_alloc call looks okay to me...
yes, tag needs protection.
I attributed the bug to R_alloc because I failed to
reason that > > > R_alloc
(obviously) allocates and hence can trigger a garbage collection.
Somehow it reflects my approach to PROTECTion, probably
not shared > > > by
everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the
expense > > > of
exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity
and logical
complexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code
between
the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.
So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the
pointer on
the top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.
Otherwise,
does it make a big difference?
One thing that got me is that the order in which C
evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of
R_setAttrib():
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
H.
Btw, I think my package RGtk2 got the record: 1952 errors.
Luckily
almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers,
Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain
legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
On 04/08/2017 08:16 AM, Aaron Lun wrote:
On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report
carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100%
sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. luke
I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.
#include <R.h>
#include <Rinternals.h>
SEXP out(SEXP x, SEXP y)
{
int nx = length(x), ny = length(y);
SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
rans[i + nx*j] = tmp * ry[j];
}
SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(3);
return ans;
}
There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.
Yes, that should be UNPROTECT(2). svn blame says the error was
introduced when allocMatrix() was introduced; prior to that the code had
allocVector(), then set dim and dimnames.
As for whether to PROTECT or not, my analysis would be...
SET_VECTOR_ELT does not (currently) allocate (except on error) so there
is no opportunity for the garbage collector to run, hence no need to
PROTECT.
Further, getAttrib() (currently) allocates only if (1) the attribute is
R_RowNamesSymbol and the row names are stored in compact format
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist or
language SEXP. None of these conditions apply, so the return value of
getAttrib() is PROTECTed anyway.
Luke's analysis would be more straight-forward: if in doubt, PROTECT.
I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
maybe also note that my advice, in addition to being an analysis of some
surprisingly complicated code by a practitioner of dubious credibility,
involves the current state of affairs, and you never know (and
apparently ALTREP makes it less certain) what the future will hold. So
they'd probably say PROTECT.
One might be tempted to write
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
but I'm not sure whether C guarantees that function arguments are fully
evaluated, maybe it's legal for a compiler to evaluate getAttrib(), then
do some more work with other arguments, then evaluate PROTECT(), so long
as the overall logic is not disrupted. So the 'if in doubt' argument
would make me write
SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);
I think , in C is called a 'sequence point'. Google takes me to
https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx
where it seems like the left operand of ',' are fully evaluated before
proceeding, and furthermore that arguments to functions are evaluated
before entering the function, implying that it is safe to use the one-liner
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
At any rate I changed the example in R-exts to UNPROTECT(2), leaving the
nuances for further discussion.
Martin
Anyway, getting back to the topic of this thread; if we were to pretend that getAttrib() allocates in the above example, would that mean that both getAttrib() calls now need to be PROTECTed by the developer? Or is this handled somewhere internally?
Thanks, H.
Best, luke
I also get a warning on almost every C++
function I've written,
because I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of
control to > > > > the
catch block. I find that R itself is pretty good at
complaining > > > > about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false
positives > > > > > (all
associated with use of an attribute of a protected SEXP),
one > > > > > subtle
problem (a symbol from a PROTECT'ed package name space;
the > > > > > symbol
could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); #
<<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT
tag, too
late!
I assume the bug refers to the un-PROTECT'd
NEW_CHARACTER here - > > > > the
R_alloc call looks okay to me...
yes, tag needs protection.
I attributed the bug to R_alloc because I failed to
reason that > > > R_alloc
(obviously) allocates and hence can trigger a garbage collection.
Somehow it reflects my approach to PROTECTion, probably
not shared > > > by
everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the
expense > > > of
exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity
and logical
complexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code
between
the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.
So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the
pointer on
the top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.
Otherwise,
does it make a big difference?
One thing that got me is that the order in which C
evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of
R_setAttrib():
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
H.
Btw, I think my package RGtk2 got the record: 1952 errors.
Luckily
almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers,
Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain
legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
Martin Morgan wrote:
On 04/08/2017 08:16 AM, Aaron Lun wrote:
On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report
carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is
pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100%
sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. luke
I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.
#include <R.h>
#include <Rinternals.h>
SEXP out(SEXP x, SEXP y)
{
int nx = length(x), ny = length(y);
SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
rans[i + nx*j] = tmp * ry[j];
}
SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(3);
return ans;
}
There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.
Yes, that should be UNPROTECT(2). svn blame says the error was
introduced when allocMatrix() was introduced; prior to that the code
had allocVector(), then set dim and dimnames.
As for whether to PROTECT or not, my analysis would be...
SET_VECTOR_ELT does not (currently) allocate (except on error) so
there is no opportunity for the garbage collector to run, hence no
need to PROTECT.
Further, getAttrib() (currently) allocates only if (1) the attribute
is R_RowNamesSymbol and the row names are stored in compact format
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
or language SEXP. None of these conditions apply, so the return value
of getAttrib() is PROTECTed anyway.
Luke's analysis would be more straight-forward: if in doubt, PROTECT.
I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
maybe also note that my advice, in addition to being an analysis of
some surprisingly complicated code by a practitioner of dubious
credibility, involves the current state of affairs, and you never know
(and apparently ALTREP makes it less certain) what the future will
hold. So they'd probably say PROTECT.
One might be tempted to write
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
but I'm not sure whether C guarantees that function arguments are
fully evaluated, maybe it's legal for a compiler to evaluate
getAttrib(), then do some more work with other arguments, then
evaluate PROTECT(), so long as the overall logic is not disrupted. So
the 'if in doubt' argument would make me write
SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);
I think , in C is called a 'sequence point'. Google takes me to
https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx
where it seems like the left operand of ',' are fully evaluated before
proceeding, and furthermore that arguments to functions are evaluated
before entering the function, implying that it is safe to use the
one-liner
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
At any rate I changed the example in R-exts to UNPROTECT(2), leaving
the nuances for further discussion.
Martin
I wonder if the following is a sensible idea:
int Rf_num_protected; // global variable
void Rf_start_protection() {
Rf_num_protected=0;
return;
}
SEXP Rf_add_protection(SEXP x) {
++Rf_num_protected;
return PROTECT(x);
}
void Rf_end_protection() {
UNPROTECT(Rf_num_protected);
return;
}
The idea would be to:
1. call Rf_start_protection() at the top of the native routine
2. replace all uses of PROTECT with Rf_add_protection
3. call Rf_end_protection() just before returning to R
This would avoid having to keep track of the number of PROTECTs
performed, which may not be trivial if the routine can return at
multiple points.
It might also useful for C++ native routine creating class instances
that need to do internal PROTECTs for the lifetime of the instance. As
long as those PROTECTs are done via Rf_add_protection(), a single
Rf_end_protection() call at the bottom of the top-level routine would be
sufficient to handle them all. In contrast, putting a matching UNPROTECT
in the class destructor is not safe, as it is possible to trigger the
destructor to UNPROTECT an unrelated SEXP:
SEXP blah(SEXP x) {
my_class* ptr=new my_class(x); // say this does an internal PROTECT
SEXP output=PROTECT(allocVector(INTSXP, 1));
// ... do something with output here...
delete ptr; // if UNPROTECT is in the destructor, it UNPROTECTs
output instead
// ... do some more stuff, possibly involving allocations ...
UNPROTECT(1); // this actually UNPROTECTs whatever was in my_class
return output;
}
Anyway, getting back to the topic of this thread; if we were to pretend that getAttrib() allocates in the above example, would that mean that both getAttrib() calls now need to be PROTECTed by the developer? Or is this handled somewhere internally?
Thanks, H.
Best, luke
I also get a warning on almost every C++
function I've written,
because I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of
control to > > > > the
catch block. I find that R itself is pretty good at
complaining > > > > about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false
positives > > > > > (all
associated with use of an attribute of a protected SEXP),
one > > > > > subtle
problem (a symbol from a PROTECT'ed package name space;
the > > > > > symbol
could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); #
<<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT
tag, too
late!
I assume the bug refers to the un-PROTECT'd
NEW_CHARACTER here - > > > > the
R_alloc call looks okay to me...
yes, tag needs protection.
I attributed the bug to R_alloc because I failed to
reason that > > > R_alloc
(obviously) allocates and hence can trigger a garbage collection.
Somehow it reflects my approach to PROTECTion, probably
not shared > > > by
everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the
expense > > > of
exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity
and logical
complexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code
between
the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.
So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the
pointer on
the top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.
Otherwise,
does it make a big difference?
One thing that got me is that the order in which C
evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated
first.
I got hit by this too long time ago but with defineVar()
instead of
R_setAttrib():
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
H.
Btw, I think my package RGtk2 got the record: 1952 errors.
Luckily
almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers,
Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain
legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
On 04/08/2017 06:50 AM, Martin Morgan wrote:
On 04/08/2017 08:16 AM, Aaron Lun wrote:
On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report
carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is
pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100%
sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.r-2Dproject.org_R_branches_ALTREP_ALTREP.html&d=DwIF-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ECmvCrpSP99jd6o3J4LkX4nL1PKJiNM1Ky6_-c7ob5k&s=S-eq87dmcXe7_GR61c5ZPGzqT9V2booIH5P7G_Jch18&e= . luke
I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.
#include <R.h>
#include <Rinternals.h>
SEXP out(SEXP x, SEXP y)
{
int nx = length(x), ny = length(y);
SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
rans[i + nx*j] = tmp * ry[j];
}
SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(3);
return ans;
}
There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.
Yes, that should be UNPROTECT(2). svn blame says the error was
introduced when allocMatrix() was introduced; prior to that the code had
allocVector(), then set dim and dimnames.
As for whether to PROTECT or not, my analysis would be...
SET_VECTOR_ELT does not (currently) allocate (except on error) so there
is no opportunity for the garbage collector to run, hence no need to
PROTECT.
Further, getAttrib() (currently) allocates only if (1) the attribute is
R_RowNamesSymbol and the row names are stored in compact format
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist or
language SEXP. None of these conditions apply, so the return value of
getAttrib() is PROTECTed anyway.
Luke's analysis would be more straight-forward: if in doubt, PROTECT.
I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
maybe also note that my advice, in addition to being an analysis of some
surprisingly complicated code by a practitioner of dubious credibility,
involves the current state of affairs, and you never know (and
apparently ALTREP makes it less certain) what the future will hold. So
they'd probably say PROTECT.
One might be tempted to write
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
but I'm not sure whether C guarantees that function arguments are fully
evaluated, maybe it's legal for a compiler to evaluate getAttrib(), then
do some more work with other arguments, then evaluate PROTECT(), so long
as the overall logic is not disrupted. So the 'if in doubt' argument
would make me write
SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);
I think , in C is called a 'sequence point'. Google takes me to
https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_azk8zbxd.aspx&d=DwIF-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ECmvCrpSP99jd6o3J4LkX4nL1PKJiNM1Ky6_-c7ob5k&s=ekY5D7Za0NSpYmZxnR3ONu7u8f_qyDme47VeHsBWp6w&e=
where it seems like the left operand of ',' are fully evaluated before
proceeding, and furthermore that arguments to functions are evaluated
before entering the function,
Of course this only applies if SET_VECTOR_ELT() is a function, not a macro. With a macro (e.g. SET_ELEMENT) all bets are off. So I think the recommendation here is to use a logic that is valid whether we are in the presence of a function or a macro. Back to my earlier point, I see nothing in R-ext that suggests that getAttrib can return something that needs protection. We all want to do the right thing with PROTECTion, but that means TFM needs to tell us what the right thing to do is. Nobody should need to grep 20 years of svn history or NEWS files or the R base code to guess what the rules of the game are. H.
implying that it is safe to use the one-liner
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
At any rate I changed the example in R-exts to UNPROTECT(2), leaving the
nuances for further discussion.
Martin
Anyway, getting back to the topic of this thread; if we were to pretend that getAttrib() allocates in the above example, would that mean that both getAttrib() calls now need to be PROTECTed by the developer? Or is this handled somewhere internally?
Thanks, H.
Best, luke
I also get a warning on almost every C++
function I've written,
because I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of
control to > > > > the
catch block. I find that R itself is pretty good at
complaining > > > > about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false
positives > > > > > (all
associated with use of an attribute of a protected SEXP),
one > > > > > subtle
problem (a symbol from a PROTECT'ed package name space;
the > > > > > symbol
could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); #
<<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT
tag, too
late!
I assume the bug refers to the un-PROTECT'd
NEW_CHARACTER here - > > > > the
R_alloc call looks okay to me...
yes, tag needs protection.
I attributed the bug to R_alloc because I failed to
reason that > > > R_alloc
(obviously) allocates and hence can trigger a garbage collection.
Somehow it reflects my approach to PROTECTion, probably
not shared > > > by
everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the
expense > > > of
exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity
and logical
complexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code
between
the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.
So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the
pointer on
the top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.
Otherwise,
does it make a big difference?
One thing that got me is that the order in which C
evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated first.
I got hit by this too long time ago but with defineVar() instead of
R_setAttrib():
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
H.
Btw, I think my package RGtk2 got the record: 1952 errors.
Luckily
almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers,
Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain
legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIF-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ECmvCrpSP99jd6o3J4LkX4nL1PKJiNM1Ky6_-c7ob5k&s=qexOFt0J-LtnpWFEQJlsD_zVgDZ3xFQUXOBR7c2PIo8&e=
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIF-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=ECmvCrpSP99jd6o3J4LkX4nL1PKJiNM1Ky6_-c7ob5k&s=qexOFt0J-LtnpWFEQJlsD_zVgDZ3xFQUXOBR7c2PIo8&e=
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
On 04/08/2017 12:29 PM, Aaron Lun wrote:
Martin Morgan wrote:
On 04/08/2017 08:16 AM, Aaron Lun wrote:
On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report
carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is
pointing
to a part of a pre-existing object ('x') and not to a newly created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100%
sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. luke
I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in Section
5.9.4 seems to be incorrect in its UNPROTECTing.
#include <R.h>
#include <Rinternals.h>
SEXP out(SEXP x, SEXP y)
{
int nx = length(x), ny = length(y);
SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
rans[i + nx*j] = tmp * ry[j];
}
SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(3);
return ans;
}
There are two PROTECT calls but an UNPROTECT(3), which triggers a stack
imbalance warning upon trying to run .Call("out", ...) in R.
Yes, that should be UNPROTECT(2). svn blame says the error was
introduced when allocMatrix() was introduced; prior to that the code
had allocVector(), then set dim and dimnames.
As for whether to PROTECT or not, my analysis would be...
SET_VECTOR_ELT does not (currently) allocate (except on error) so
there is no opportunity for the garbage collector to run, hence no
need to PROTECT.
Further, getAttrib() (currently) allocates only if (1) the attribute
is R_RowNamesSymbol and the row names are stored in compact format
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
or language SEXP. None of these conditions apply, so the return value
of getAttrib() is PROTECTed anyway.
Luke's analysis would be more straight-forward: if in doubt, PROTECT.
I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
maybe also note that my advice, in addition to being an analysis of
some surprisingly complicated code by a practitioner of dubious
credibility, involves the current state of affairs, and you never know
(and apparently ALTREP makes it less certain) what the future will
hold. So they'd probably say PROTECT.
One might be tempted to write
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
but I'm not sure whether C guarantees that function arguments are
fully evaluated, maybe it's legal for a compiler to evaluate
getAttrib(), then do some more work with other arguments, then
evaluate PROTECT(), so long as the overall logic is not disrupted. So
the 'if in doubt' argument would make me write
SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);
I think , in C is called a 'sequence point'. Google takes me to
https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx
where it seems like the left operand of ',' are fully evaluated before
proceeding, and furthermore that arguments to functions are evaluated
before entering the function, implying that it is safe to use the
one-liner
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
At any rate I changed the example in R-exts to UNPROTECT(2), leaving
the nuances for further discussion.
Martin
I wonder if the following is a sensible idea:
int Rf_num_protected; // global variable
void Rf_start_protection() {
Rf_num_protected=0;
return;
}
SEXP Rf_add_protection(SEXP x) {
++Rf_num_protected;
return PROTECT(x);
}
void Rf_end_protection() {
UNPROTECT(Rf_num_protected);
return;
}
The idea would be to:
1. call Rf_start_protection() at the top of the native routine
2. replace all uses of PROTECT with Rf_add_protection
3. call Rf_end_protection() just before returning to R
This would avoid having to keep track of the number of PROTECTs
performed, which may not be trivial if the routine can return at
multiple points.
It might also useful for C++ native routine creating class instances
that need to do internal PROTECTs for the lifetime of the instance. As
long as those PROTECTs are done via Rf_add_protection(), a single
Rf_end_protection() call at the bottom of the top-level routine would be
sufficient to handle them all. In contrast, putting a matching UNPROTECT
in the class destructor is not safe, as it is possible to trigger the
destructor to UNPROTECT an unrelated SEXP:
SEXP blah(SEXP x) {
my_class* ptr=new my_class(x); // say this does an internal PROTECT
SEXP output=PROTECT(allocVector(INTSXP, 1));
// ... do something with output here...
delete ptr; // if UNPROTECT is in the destructor, it UNPROTECTs
output instead
// ... do some more stuff, possibly involving allocations ...
UNPROTECT(1); // this actually UNPROTECTs whatever was in my_class
return output;
}
Global variables are problematic to reason about, e.g., in nested calls or parallel code sections. 'ad hoc' (no offense intended) solutions often increase rather than reduce cognitive burden, because someone new to the code (including one's future self) has to parse the intention and validate use. Rcpp seems like the right approach for C++ code; it largely removes the need for explicit PROTECTion management, and is widely used and responsibly maintained so the edge cases / tricky problems get discovered and addressed. Martin
Anyway, getting back to the topic of this thread; if we were to pretend that getAttrib() allocates in the above example, would that mean that both getAttrib() calls now need to be PROTECTed by the developer? Or is this handled somewhere internally?
Thanks, H.
Best, luke
I also get a warning on almost every C++
function I've written,
because I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of
control to > > > > the
catch block. I find that R itself is pretty good at
complaining > > > > about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false
positives > > > > > (all
associated with use of an attribute of a protected SEXP),
one > > > > > subtle
problem (a symbol from a PROTECT'ed package name space;
the > > > > > symbol
could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); #
<<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT
tag, too
late!
I assume the bug refers to the un-PROTECT'd
NEW_CHARACTER here - > > > > the
R_alloc call looks okay to me...
yes, tag needs protection.
I attributed the bug to R_alloc because I failed to
reason that > > > R_alloc
(obviously) allocates and hence can trigger a garbage collection.
Somehow it reflects my approach to PROTECTion, probably
not shared > > > by
everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the
expense > > > of
exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity
and logical
complexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code
between
the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.
So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the
pointer on
the top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.
Otherwise,
does it make a big difference?
One thing that got me is that the order in which C
evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need protection,
since it happened last. Unfortunately, it might be evaluated
first.
I got hit by this too long time ago but with defineVar()
instead of
R_setAttrib():
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
H.
Btw, I think my package RGtk2 got the record: 1952 errors.
Luckily
almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers,
Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain
legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
Martin Morgan wrote:
On 04/08/2017 12:29 PM, Aaron Lun wrote:
Martin Morgan wrote:
On 04/08/2017 08:16 AM, Aaron Lun wrote:
On 07/04/17 20:46, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/07/2017 05:37 AM, luke-tierney at uiowa.edu wrote:
On Fri, 7 Apr 2017, Herv? Pag?s wrote:
On 04/06/2017 03:29 AM, Michael Lawrence wrote:
On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.morgan at roswellpark.org> wrote:
On 04/06/2017 05:33 AM, Aaron Lun wrote:
The tool is not perfect, so assess each report
carefully.
I get a lot of warnings because the tool seems to consider that
extracting an attribute (with getAttrib(x, ...)) or extracting
the
slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...))
returns an SEXP that needs protection. I always assumed that it
didn't because my understanding is that the returned SEXP is
pointing
to a part of a pre-existing object ('x') and not to a newly
created
one. So I decided I could treat it like the SEXP returned by
VECTOR_ELT(), which, AFAIK, doesn't need protection.
So I suspect these warnings are false positives but I'm not 100%
sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.
Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?
Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. luke
I was curious about this so I checked out what R-exts had to say
involving set/getAttrib. Funnily enough, the example it gives in
Section
5.9.4 seems to be incorrect in its UNPROTECTing.
#include <R.h>
#include <Rinternals.h>
SEXP out(SEXP x, SEXP y)
{
int nx = length(x), ny = length(y);
SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny));
double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans);
for(int i = 0; i < nx; i++) {
double tmp = rx[i];
for(int j = 0; j < ny; j++)
rans[i + nx*j] = tmp * ry[j];
}
SEXP dimnames = PROTECT(allocVector(VECSXP, 2));
SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol));
setAttrib(ans, R_DimNamesSymbol, dimnames);
UNPROTECT(3);
return ans;
}
There are two PROTECT calls but an UNPROTECT(3), which triggers a
stack
imbalance warning upon trying to run .Call("out", ...) in R.
Yes, that should be UNPROTECT(2). svn blame says the error was
introduced when allocMatrix() was introduced; prior to that the code
had allocVector(), then set dim and dimnames.
As for whether to PROTECT or not, my analysis would be...
SET_VECTOR_ELT does not (currently) allocate (except on error) so
there is no opportunity for the garbage collector to run, hence no
need to PROTECT.
Further, getAttrib() (currently) allocates only if (1) the attribute
is R_RowNamesSymbol and the row names are stored in compact format
c(NA_integer_, nrow); or (2) the first argument is a classic pairlist
or language SEXP. None of these conditions apply, so the return value
of getAttrib() is PROTECTed anyway.
Luke's analysis would be more straight-forward: if in doubt, PROTECT.
I think Herve, Gabe, and perhaps Michael would take Luke's advice, and
maybe also note that my advice, in addition to being an analysis of
some surprisingly complicated code by a practitioner of dubious
credibility, involves the current state of affairs, and you never know
(and apparently ALTREP makes it less certain) what the future will
hold. So they'd probably say PROTECT.
One might be tempted to write
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
but I'm not sure whether C guarantees that function arguments are
fully evaluated, maybe it's legal for a compiler to evaluate
getAttrib(), then do some more work with other arguments, then
evaluate PROTECT(), so long as the overall logic is not disrupted. So
the 'if in doubt' argument would make me write
SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol));
SET_VECTOR_ELT(dimnames, 0, nms);
I think , in C is called a 'sequence point'. Google takes me to
https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx
where it seems like the left operand of ',' are fully evaluated before
proceeding, and furthermore that arguments to functions are evaluated
before entering the function, implying that it is safe to use the
one-liner
SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol)));
At any rate I changed the example in R-exts to UNPROTECT(2), leaving
the nuances for further discussion.
Martin
I wonder if the following is a sensible idea:
int Rf_num_protected; // global variable
void Rf_start_protection() {
Rf_num_protected=0;
return;
}
SEXP Rf_add_protection(SEXP x) {
++Rf_num_protected;
return PROTECT(x);
}
void Rf_end_protection() {
UNPROTECT(Rf_num_protected);
return;
}
The idea would be to:
1. call Rf_start_protection() at the top of the native routine
2. replace all uses of PROTECT with Rf_add_protection
3. call Rf_end_protection() just before returning to R
This would avoid having to keep track of the number of PROTECTs
performed, which may not be trivial if the routine can return at
multiple points.
It might also useful for C++ native routine creating class instances
that need to do internal PROTECTs for the lifetime of the instance. As
long as those PROTECTs are done via Rf_add_protection(), a single
Rf_end_protection() call at the bottom of the top-level routine would be
sufficient to handle them all. In contrast, putting a matching UNPROTECT
in the class destructor is not safe, as it is possible to trigger the
destructor to UNPROTECT an unrelated SEXP:
SEXP blah(SEXP x) {
my_class* ptr=new my_class(x); // say this does an internal PROTECT
SEXP output=PROTECT(allocVector(INTSXP, 1));
// ... do something with output here...
delete ptr; // if UNPROTECT is in the destructor, it UNPROTECTs
output instead
// ... do some more stuff, possibly involving allocations ...
UNPROTECT(1); // this actually UNPROTECTs whatever was in my_class
return output;
}
Global variables are problematic to reason about, e.g., in nested calls or parallel code sections. 'ad hoc' (no offense intended) solutions often increase rather than reduce cognitive burden, because someone new to the code (including one's future self) has to parse the intention and validate use. Rcpp seems like the right approach for C++ code; it largely removes the need for explicit PROTECTion management, and is widely used and responsibly maintained so the edge cases / tricky problems get discovered and addressed. Martin
Yes, I was wondering whether I should transition all of my C++ code to Rcpp. It would require a decent amount of effort involving ~7 packages, so I've held off from doing it... but if I have to go through the code to fix all of the PROTECTs anyway, I might as well go all the way and switch to using Rcpp. Sounds like a small project for my Easter break. -Aaron
Anyway, getting back to the topic of this thread; if we were to pretend that getAttrib() allocates in the above example, would that mean that both getAttrib() calls now need to be PROTECTed by the developer? Or is this handled somewhere internally?
Thanks, H.
Best, luke
I also get a warning on almost every C++
function I've written,
because I use the following code to handle exceptions:
SEXP output=PROTECT(allocVector(...));
try {
// do something that might raise an exception
} catch (std::exception& e) {
UNPROTECT(1);
throw; // break out of this part of the function
}
UNPROTECT(1);
return output;
Presumably the check doesn't account for transfer of
control to > > > > the
catch block. I find that R itself is pretty good at
complaining > > > > about
stack imbalances during execution of tests, examples, etc.
'My' packages
(Rsamtools, DirichletMultinomial) had several false
positives > > > > > (all
associated with use of an attribute of a protected SEXP),
one > > > > > subtle
problem (a symbol from a PROTECT'ed package name space;
the > > > > > symbol
could in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bug
tag = NEW_CHARACTER(n);
for (int j = 0; j < n; ++j)
SET_STRING_ELT(tag, j, NA_STRING);
if ('A' == aux[0]) {
buf_A = R_alloc(2, sizeof(char)); #
<<- bug
buf_A[1] = '\0';
}
...
SET_VECTOR_ELT(tags, i, tag); # PROTECT
tag, too
late!
I assume the bug refers to the un-PROTECT'd
NEW_CHARACTER here - > > > > the
R_alloc call looks okay to me...
yes, tag needs protection.
I attributed the bug to R_alloc because I failed to
reason that > > > R_alloc
(obviously) allocates and hence can trigger a garbage collection.
Somehow it reflects my approach to PROTECTion, probably
not shared > > > by
everyone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at the
expense > > > of
exposing me to bugs like this.
I guess it's a tradeoff between syntactic complexity
and logical
complexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the code
between
the line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.
So a dangerous game for virtually zero benefits.
One recommendation might be to UNPROTECT() as soon as the
pointer on
the top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.
Otherwise,
does it make a big difference?
One thing that got me is that the order in which C
evaluates function
call arguments is undefined. I did a lot of R_setAttrib(x,
install("foo"), allocBar()), thinking that the symbol would be
automatically protected, and allocBar() would not need
protection,
since it happened last. Unfortunately, it might be evaluated
first.
I got hit by this too long time ago but with defineVar()
instead of
R_setAttrib():
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=
H.
Btw, I think my package RGtk2 got the record: 1952 errors.
Luckily
almost all of them happened inside a few macros and autogenerated code.
Martin
Cheers,
Aaron
_______________________________________________ Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
This email message may contain
legally privileged
and/or...{{dropped:2}}
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________
Bioc-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.