In character.c within R 2.6.1, the function do_gsub contains the following
code:
1199 if (STRING_ELT(pat, 0) == NA_STRING) {
1200 PROTECT(ans = allocVector(STRSXP, n));
1201 for(i = 0; i < n; i++) SET_STRING_ELT(ans, i, NA_STRING);
1202 UNPROTECT(1);
1203 return ans;
1204 }
1205
1206 if (length(pat) < 1 || length(rep) < 1) error(R_MSG_IA);
Line 1206 checks that pat contains at least one element. However, line 1199
has already attempted to access the first element. The check should surely
come first.
Andrew Runnalls
do_gsub (PR#10540)
3 messages · A.R.Runnalls at kent.ac.uk, Hin-Tak Leung, Andrew Runnalls
8 days later
I believe STRING_ELT() works even when pat is R_NilValue, so the first line "does the right thing" even if the first element does not exist. I think this is intentional and not a bug. Other more knowledgeable people please correct me if I am wrong.
A.R.Runnalls at kent.ac.uk wrote:
In character.c within R 2.6.1, the function do_gsub contains the following
code:
1199 if (STRING_ELT(pat, 0) == NA_STRING) {
1200 PROTECT(ans = allocVector(STRSXP, n));
1201 for(i = 0; i < n; i++) SET_STRING_ELT(ans, i, NA_STRING);
1202 UNPROTECT(1);
1203 return ans;
1204 }
1205
1206 if (length(pat) < 1 || length(rep) < 1) error(R_MSG_IA);
Line 1206 checks that pat contains at least one element. However, line 1199
has already attempted to access the first element. The check should surely
come first.
Andrew Runnalls
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
13 days later
Hin-Tak, Apologies for the delay in responding.
On 2008/01/11 Hin-Tak Leung wrote:
I believe STRING_ELT() works even when pat is R_NilValue, so the first line "does the right thing" even if the first element does not exist. I think this is intentional and not a bug. Other more knowledgeable people please correct me if I am wrong. A.R.Runnalls at kent.ac.uk wrote:
In character.c within R 2.6.1, the function do_gsub contains the following code:
1199 if (STRING_ELT(pat, 0) == NA_STRING) {
1200 PROTECT(ans = allocVector(STRSXP, n));
1201 for(i = 0; i < n; i++) SET_STRING_ELT(ans, i, NA_STRING);
1202 UNPROTECT(1);
1203 return ans;
1204 }
1205
1206 if (length(pat) < 1 || length(rep) < 1) error(R_MSG_IA);
Line 1206 checks that pat contains at least one element. However,
line 1199 has already attempted to access the first element. The check
should surely come first.
Generally speaking, in the interests of software reliability and maintainability, it is a good idea for part of a program to 'do the right thing' itself - in the present case, to check that an array is non-empty before assuming that it is - rather than relying on what is in fact rather unusual behaviour of another part of the program. The case I am concerned about is not in fact where 'pat' is R_NilValue, but when it is a vector of length zero. When asked to create such a vector, memory.c will in fact allocate some element space in the vector, which is why the above problem doesn't manifest itself as a crash (in the no-segfault.R test, for example, which explores this case); however, it seems unwise to rely on this. And what would happen if that (redundantly) allocated space just happened already to contain NA_STRING? Best regards, Andrew