Skip to content

Compressing String in R

4 messages · Gundala Viswanath, jim holtman, Stavros Macrakis

#
Dear all,

What's the R way to compress the string into smaller 2~3 char/digit length.
In particular I want to compress string of length >=30 characters,
e.g. ACGATACGGCGACCACCGAGATCTACACTCTTCC

The reason I want to do that is because, there are billions
of such string I want to print out. And I need to save disk space.

- Gundala Viswanath
Jakarta - Indonesia
#
Since you only have 4 characters, you can can create a table of all
the combinations of 4 of them and this will reduce to one byte instead
of 4.  This is fine if you just want to store them.
+     c("A", "C", "G", "T"),
+     c("A", "C", "G", "T"),
+     c("A", "C", "G", "T"))
[1] "ACGATACGGCGACCACCGAGATCTACACTCTTCCCC"

        
On Wed, Dec 24, 2008 at 10:26 AM, Gundala Viswanath <gundalav at gmail.com> wrote: