[Bioc-devel] BStringSet Documentation
Hi,
On 09/01/2016 12:00 AM, Dario Strbenac wrote:
Good day, According to the documentation, I wouldn't think that substr or strsplit would work on a BStringSet, but substr does.
IDs
A BStringSet instance of length 5
width seq
[1] 61 D00626:168:C9CWMANXX:1:1105:1816:1998 1:N:0:TCCGGAGA+ATAGAGGC
[2] 61 D00626:168:C9CWMANXX:1:1105:2113:1989 1:N:0:TCCGGAGA+ATAGAGGC
[3] 61 D00626:168:C9CWMANXX:1:1105:2703:1986 1:N:0:TCCGGAGA+ATAGAGGC
[4] 61 D00626:168:C9CWMANXX:1:1105:3255:1979 1:N:0:TCCGGAGA+ATAGAGGC
[5] 61 D00626:168:C9CWMANXX:1:1105:4525:1995 1:N:0:TCCGGAGA+ATAGAGGC
substr(IDs, 1, 37)
[1] "D00626:168:C9CWMANXX:1:1105:1816:1998" [2] "D00626:168:C9CWMANXX:1:1105:2113:1989" [3] "D00626:168:C9CWMANXX:1:1105:2703:1986" [4] "D00626:168:C9CWMANXX:1:1105:3255:1979" [5] "D00626:168:C9CWMANXX:1:1105:4525:1995"
strsplit(IDs, ' ')
Error in strsplit(IDs, " ") : non-character argument I think that both of these functions shouldn't work or both should work, to be consistent.
Why? Because they both have "str" in their name?
It sounds that you are expecting that every string manipulation function
defined in base R should work on a BStringSet object. Well that's not
the case and I don't think that's ever going to happen. Some of them
work and some of them don't. We can add more if needed (e.g. strsplit)
but there are things like the grep family that BStringSet objects will
probably never support.
If you need to strsplit() an XStringSet object, you can use this:
strsplitXStringSet <- function(x, split)
{
m <- vmatchPattern(split, x)
at <- gaps(IRangesList(start=start(m),
end=end(m)), start=1L, end=width(x))
extractAt(x, at)
}
It's going to behave like strsplit(x, split, fixed=TRUE) except when
there is a match at the beginning or end of one of the sequences (in
which case strsplit() has a questionable behavior). Also, unlike
strsplit(), strsplitXStringSet() doesn't support an empty split
pattern.
Note that BStringSet objects have supported the reverse operation
for a while. See ?unstrsplit
I'll add strsplitXStringSet() to Biostrings, as the "strsplit" method
for XStringSet objects.
H.
-------------------------------------- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319