[Bioc-devel] phred qualities
On Wed, Jun 27, 2012 at 2:26 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 06/27/2012 11:22 AM, Martin Morgan wrote:
On 06/27/2012 08:02 AM, Kasper Daniel Hansen wrote:
Phred qualities are usually presented as ascii-encode numbers with an offset of either 32 or 64. Some packages returns this as a BStringSet. I can convert a character vector "charvec" to a list of integers using code like sapply(charvec, function(xx) charToRaw(xx) - 33L) Do we have fast(er) ways of doing this, when charvec is really long and not necessarily with the same number of chars in each string? I am thinking of implementing the sapply() above in C (directly vectorizing it), but surely someone has done something like that somewhere.
I think you get this with XStringSet, e.g., PhredQuality, with
x = PhredQuality(c("HH", "III"))
y = as.numeric(unlist(x)) - 33L
?as.integer
z = relist(y, x)
or for a simple list ?split(y, rep(seq_along(x), elementLengths(x)) I have a recollection that there is something built-in...
Would also be nice if the as.integer(unlist(x)) knew that x is a PhredQuality and therefore knew to subtract 33. From the PhredQuality docs it seems that this has already happened in the underlying raw vector, and when you do unlist(x) it converts it back into a BString. .... Looking in Biostrings there is .XStringQualityToIntegerMatrix which is used in as(x, "matrix") which does what I want, but assumes that all strings have equal width. So I guess I should write something like an as(x, "list") method, which I can do using x at ranges. But would that conflict with the unlist(x) command above. Or should it have another name? Kasper
Martin
Martin
Kasper
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793