Skip to content
Prev 166494 / 398502 Next

Pack and Unpack Strings in R

Gundala --
Gundala Viswanath wrote:
All of your questions relate to DNA strings. The R/Bioconductor package 
Biostrings is designed to manipulate such objects. It does not 
necessarily address this particular problem (because in general DNA 
strings contain any of the 16 IUPAC symbols and hence compression 
becomes less compelling, and as you indicate even with compression the 
size of the data means that one might often need to process parts of the 
data at a time), but may provide useful containers and methods that make 
such issues less important.

 > source('http://bioconductor.org/biocLite.R')
 > biocLite('Biostrings')
 > library('Biostrings')

see also the vignettes for the package, available within R or for example at

http://bioconductor.org/packages/release/bioc/html/Biostrings.html

It seems that you have data suitable for representation as a DNAStringSet.

The package is actively developed, and using the 'devel' version of R 
(and hence 'devel' version of Biostrings) might provide additional 
important facilities. If this proves useful then follow-up questions 
should use the Bioconductor mailing lists

http://bioconductor.org/docs/mailList.html

Martin