Skip to content
Prev 257537 / 398502 Next

'Record' row values every time the binary value in a collumn changes

Hi:

Here are a couple more options using packages plyr and data.table. The
labels in the second part are changed because they didn't make sense
in a 2M line file (well, mine may not either, but it's a start). You
can always change them to something more pertinent.

# Question 1:
Table <- data.frame(binary, chromosome = Chromosome, start)

library(plyr)
(df <- ddply(Table, .(chromosome, binary), summarise, position_start =
min(start),
         position_end = max(start)))
  chromosome binary position_start position_end
1          1      0             20           36
2          1      1             12           18
3          2      0             17           19
4          2      1             12           16

library(data.table)
dTable <- data.table(Table, key = 'chromosome, binary')
(dt <- dTable[, list(position_start = min(start),
               position_end = max(start)), by = 'chromosome, binary'])
     chromosome binary position_start position_end
[1,]          1      0             20           36
[2,]          1      1             12           18
[3,]          2      0             17           19
[4,]          2      1             12           16

## Question 2:

For plyr, it's easy to write a function that takes a generic input data frame
(in this case, a single line) and then outputs a data frame with
positions and labels.

tfun <- function(df) {
     diff <- with(df, position_end - position_start + 1)
     position <- with(df, seq(position_start, position_end))
     value <- paste(df$chromosome, df$binary, letters[1:diff], sep = '.')
     data.frame(chromosome = df$chromosome, position, value, binary = df$binary)
    }

# Then:
chromosome position value binary
1           1       20 1.0.a      0
2           1       21 1.0.b      0
3           1       22 1.0.c      0
4           1       23 1.0.d      0
5           1       24 1.0.e      0
6           1       25 1.0.f      0
7           1       26 1.0.g      0
8           1       27 1.0.h      0
9           1       28 1.0.i      0
10          1       29 1.0.j      0
11          1       30 1.0.k      0
12          1       31 1.0.l      0
13          1       32 1.0.m      0
14          1       33 1.0.n      0
15          1       34 1.0.o      0
16          1       35 1.0.p      0
17          1       36 1.0.q      0
18          1       12 1.1.a      1
19          1       13 1.1.b      1
20          1       14 1.1.c      1
21          1       15 1.1.d      1
22          1       16 1.1.e      1
23          1       17 1.1.f      1
24          1       18 1.1.g      1
25          2       17 2.0.a      0
26          2       18 2.0.b      0
27          2       19 2.0.c      0
28          2       12 2.1.a      1
29          2       13 2.1.b      1
30          2       14 2.1.c      1
31          2       15 2.1.d      1
32          2       16 2.1.e      1

# For data.table, one can apply the internals of tfun directly:

dt[, list(chromosome = chromosome, position = seq(position_start, position_end),
            value = paste(chromosome, binary,
                      letters[1:(position_end - position_start + 1)],
sep = '.'),
            binary = binary), by = 'chromosome, binary']
   chromosome binary chromosome.1 position value binary.1
            1      0            1       20 1.0.a        0
            1      0            1       21 1.0.b        0
            1      0            1       22 1.0.c        0
            1      0            1       23 1.0.d        0
            1      0            1       24 1.0.e        0
            1      0            1       25 1.0.f        0
            1      0            1       26 1.0.g        0
            1      0            1       27 1.0.h        0
            1      0            1       28 1.0.i        0
            1      0            1       29 1.0.j        0
            1      0            1       30 1.0.k        0
            1      0            1       31 1.0.l        0
            1      0            1       32 1.0.m        0
            1      0            1       33 1.0.n        0
            1      0            1       34 1.0.o        0
            1      0            1       35 1.0.p        0
            1      0            1       36 1.0.q        0
            1      1            1       12 1.1.a        1
            1      1            1       13 1.1.b        1
            1      1            1       14 1.1.c        1
            1      1            1       15 1.1.d        1
            1      1            1       16 1.1.e        1
            1      1            1       17 1.1.f        1
            1      1            1       18 1.1.g        1
            2      0            2       17 2.0.a        0
            2      0            2       18 2.0.b        0
            2      0            2       19 2.0.c        0
            2      1            2       12 2.1.a        1
            2      1            2       13 2.1.b        1
            2      1            2       14 2.1.c        1
            2      1            2       15 2.1.d        1
            2      1            2       16 2.1.e        1
cn chromosome binary   chromosome position value   binary

HTH,
Dennis
On Wed, Apr 20, 2011 at 2:01 AM, baboon2010 <nielsvanderaa at live.be> wrote: