Using the IRanges package from Bioconductor and somewhat recent R-2.9.1.
ov = IRanges(1:3, 4:6)
length(ov) # 3
seq(along = ov) # 1 2 3 as wanted
seq_along(ov) # 1!
I had expected that the last line would yield 1:3. My guess is that
somehow seq_along don't utilize that ov is an S4 class with a length
method.
The last line of the *Details* section of ?seq has a typeo. Currently
it is
'seq.int', 'seq_along' and 'seq.int' are primitive: the latter two
ignore any argument name.
I would guess it ought to be
'seq.int', 'seq_along' and 'seq_len' are primitive: the latter two
ignore any argument name.
Kasper
bug in seq_along
3 messages · Hervé Pagès, Kasper Daniel Hansen
4 days later
Hi Kasper and R developers,
Kasper Daniel Hansen wrote:
Using the IRanges package from Bioconductor and somewhat recent R-2.9.1. ov = IRanges(1:3, 4:6) length(ov) # 3 seq(along = ov) # 1 2 3 as wanted seq_along(ov) # 1! I had expected that the last line would yield 1:3. My guess is that somehow seq_along don't utilize that ov is an S4 class with a length method.
I agree, this is not good. seq_along() has always been broken on S4 objects: https://stat.ethz.ch/pipermail/r-devel/2007-July/046337.html so I prefer to not use it, ever. Even when I deal with S3 objects. Because the day I need to extend my code to deal with S4 objects, it's too easy to forget to replace 'seq_along(x)' with 'seq_len(length(x))'. So I'd rather use the latter all the time and from the very beginning (hopefully there is no serious performance penalty for doing this). Surprisingly, seq_along() diserves its own C implementation (why wouldn't seq_along <- function(x) seq_len(length(x)) be just good enough?). It's calling length() at the C level which is an inline function defined as: INLINE_FUN R_len_t length(SEXP s) { int i; switch (TYPEOF(s)) { case NILSXP: return 0; case LGLSXP: case INTSXP: case REALSXP: case CPLXSXP: case STRSXP: case CHARSXP: case VECSXP: case EXPRSXP: case RAWSXP: return LENGTH(s); case LISTSXP: case LANGSXP: case DOTSXP: i = 0; while (s != NULL && s != R_NilValue) { i++; s = CDR(s); } return i; case ENVSXP: return Rf_envlength(s); default: return 1; } } Hence it will return 1 when 's' is an S4SXP. If for whatever reason, seq_along() is not able to figure out what the *real* length of an S4 object is, then wouldn't it be better to make it return an error? Or at least to put a big warning in its man page saying: DON'T TRUST ME ON YOUR S4 OBJECTS, I'M BROKEN! Cheers, H.
The last line of the *Details* section of ?seq has a typeo. Currently it is
'seq.int', 'seq_along' and 'seq.int' are primitive: the latter two
ignore any argument name.
I would guess it ought to be
'seq.int', 'seq_along' and 'seq_len' are primitive: the latter two
ignore any argument name.
Kasper
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
6 days later
This has now been fixed in R-2.9 and R-devel by Martin Maechler. Thanks Kasper
On Jul 13, 2009, at 15:43 , Herv? Pag?s wrote:
Hi Kasper and R developers, Kasper Daniel Hansen wrote:
Using the IRanges package from Bioconductor and somewhat recent R-2.9.1. ov = IRanges(1:3, 4:6) length(ov) # 3 seq(along = ov) # 1 2 3 as wanted seq_along(ov) # 1! I had expected that the last line would yield 1:3. My guess is that somehow seq_along don't utilize that ov is an S4 class with a length method.
I agree, this is not good. seq_along() has always been broken on S4 objects: https://stat.ethz.ch/pipermail/r-devel/2007-July/046337.html so I prefer to not use it, ever. Even when I deal with S3 objects. Because the day I need to extend my code to deal with S4 objects, it's too easy to forget to replace 'seq_along(x)' with 'seq_len(length(x))'. So I'd rather use the latter all the time and from the very beginning (hopefully there is no serious performance penalty for doing this). Surprisingly, seq_along() diserves its own C implementation (why wouldn't seq_along <- function(x) seq_len(length(x)) be just good enough?). It's calling length() at the C level which is an inline function defined as: INLINE_FUN R_len_t length(SEXP s) { int i; switch (TYPEOF(s)) { case NILSXP: return 0; case LGLSXP: case INTSXP: case REALSXP: case CPLXSXP: case STRSXP: case CHARSXP: case VECSXP: case EXPRSXP: case RAWSXP: return LENGTH(s); case LISTSXP: case LANGSXP: case DOTSXP: i = 0; while (s != NULL && s != R_NilValue) { i++; s = CDR(s); } return i; case ENVSXP: return Rf_envlength(s); default: return 1; } } Hence it will return 1 when 's' is an S4SXP. If for whatever reason, seq_along() is not able to figure out what the *real* length of an S4 object is, then wouldn't it be better to make it return an error? Or at least to put a big warning in its man page saying: DON'T TRUST ME ON YOUR S4 OBJECTS, I'M BROKEN! Cheers, H.
The last line of the *Details* section of ?seq has a typeo.
Currently it is
'seq.int', 'seq_along' and 'seq.int' are primitive: the latter
two
ignore any argument name.
I would guess it ought to be
'seq.int', 'seq_along' and 'seq_len' are primitive: the latter
two
ignore any argument name.
Kasper
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319