My question is twofold.
Part 1:
My data looks like this:
(example set, real data has 2*10^6 rows)
binary<-c(1,1,1,0,0,0,1,1,1,0,0)
Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
start<-c(12,17,18,20,25,36,12,15,16,17,19)
Table<-cbind(Chromosome,start,binary)
Chromosome start binary
[1,] 1 12 1
[2,] 1 17 1
[3,] 1 18 1
[4,] 1 20 0
[5,] 1 25 0
[6,] 1 36 0
[7,] 2 12 1
[8,] 2 15 1
[9,] 2 16 1
[10,] 2 17 0
[11,] 2 19 0
As output I need a shortlist for each binary block: giving me the starting
and ending position of each block.
Which for these example would look like this:
Chromosome2 position_start position_end binary2
[1,] 1 12 18 1
[2,] 1 20 36 0
[3,] 2 12 16 1
[4,] 2 17 19 0
Part 2:
Based on the output of part 1, I need to assign the binary to rows of
another data set. If the position value in this second data set falls in one
of the blocks defined in the shortlist made in part1,the binary value of the
shortlist should be assigned to an extra column for this row. This would
look something like this:
Chromosome3 position Value binary3
[1,] "1" "12" "a" "1"
[2,] "1" "13" "b" "1"
[3,] "1" "14" "c" "1"
[4,] "1" "15" "d" "1"
[5,] "1" "16" "e" "1"
[6,] "1" "18" "f" "1"
[7,] "1" "20" "g" "0"
[8,] "1" "21" "h" "0"
[9,] "1" "22" "i" "0"
[10,] "1" "23" "j" "0"
[11,] "1" "25" "k" "0"
[12,] "1" "35" "l" "0"
[13,] "2" "12" "m" "1"
[14,] "2" "13" "n" "1"
[15,] "2" "14" "o" "1"
[16,] "2" "15" "p" "1"
[17,] "2" "16" "q" "1"
[18,] "2" "17" "s" "0"
[19,] "2" "18" "d" "0"
[20,] "2" "19" "f" "0"
Many thanks in advance,
Niels
--
View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html
Sent from the R help mailing list archive at Nabble.com.
'Record' row values every time the binary value in a collumn changes
5 messages · baboon2010, jim holtman, William Dunlap +2 more
Here is an answer to part 1:
binary<-c(1,1,1,0,0,0,1,1,1,0,0) Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) start<-c(12,17,18,20,25,36,12,15,16,17,19) Table<-cbind(Chromosome,start,binary) # determine where the start/end of each group is # use indices since the size is large startEnd <- lapply(split(seq(nrow(Table))
+ , list(Table[, "Chromosome"], Table[, 'binary'])
+ , drop = TRUE
+ )
+ , function(.indx){
+ se <- range(.indx)
+ c(Chromosome2 = unname(Table[se[1L], "Chromosome"])
+ , position_start = unname(Table[se[1L], 'start'])
+ , position_end = unname(Table[se[2L], 'start'])
+ , binary2 = unname(Table[se[1L], 'binary'])
+ )
+ })
do.call(rbind, startEnd)
Chromosome2 position_start position_end binary2 1.0 1 20 36 0 2.0 2 17 19 0 1.1 1 12 18 1 2.1 2 12 16 1
On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
My question is twofold. Part 1: My data looks like this: (example set, real data has 2*10^6 rows) binary<-c(1,1,1,0,0,0,1,1,1,0,0) Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) start<-c(12,17,18,20,25,36,12,15,16,17,19) Table<-cbind(Chromosome,start,binary) ? ? ?Chromosome start binary ?[1,] ? ? ? ? ?1 ? ?12 ? ? ?1 ?[2,] ? ? ? ? ?1 ? ?17 ? ? ?1 ?[3,] ? ? ? ? ?1 ? ?18 ? ? ?1 ?[4,] ? ? ? ? ?1 ? ?20 ? ? ?0 ?[5,] ? ? ? ? ?1 ? ?25 ? ? ?0 ?[6,] ? ? ? ? ?1 ? ?36 ? ? ?0 ?[7,] ? ? ? ? ?2 ? ?12 ? ? ?1 ?[8,] ? ? ? ? ?2 ? ?15 ? ? ?1 ?[9,] ? ? ? ? ?2 ? ?16 ? ? ?1 [10,] ? ? ? ? ?2 ? ?17 ? ? ?0 [11,] ? ? ? ? ?2 ? ?19 ? ? ?0 As output I need a shortlist for each binary block: giving me the starting and ending position of each block. Which for these example would look like this: ? ? Chromosome2 position_start position_end binary2 [1,] ? ? ? ? ? 1 ? ? ? ? ? ? 12 ? ? ? ? ? 18 ? ? ? 1 [2,] ? ? ? ? ? 1 ? ? ? ? ? ? 20 ? ? ? ? ? 36 ? ? ? 0 [3,] ? ? ? ? ? 2 ? ? ? ? ? ? 12 ? ? ? ? ? 16 ? ? ? 1 [4,] ? ? ? ? ? 2 ? ? ? ? ? ? 17 ? ? ? ? ? 19 ? ? ? 0 Part 2: Based on the output of part 1, I need to assign the binary to rows of another data set. If the position value in this second data set falls in one of the blocks defined in the shortlist made in part1,the binary value of the shortlist should be assigned to an extra column for this row. ?This would look something like this: ? ? Chromosome3 position Value binary3 ?[1,] "1" ? ? ? ? "12" ? ? "a" ? "1" ?[2,] "1" ? ? ? ? "13" ? ? "b" ? "1" ?[3,] "1" ? ? ? ? "14" ? ? "c" ? "1" ?[4,] "1" ? ? ? ? "15" ? ? "d" ? "1" ?[5,] "1" ? ? ? ? "16" ? ? "e" ? "1" ?[6,] "1" ? ? ? ? "18" ? ? "f" ? "1" ?[7,] "1" ? ? ? ? "20" ? ? "g" ? "0" ?[8,] "1" ? ? ? ? "21" ? ? "h" ? "0" ?[9,] "1" ? ? ? ? "22" ? ? "i" ? "0" [10,] "1" ? ? ? ? "23" ? ? "j" ? "0" [11,] "1" ? ? ? ? "25" ? ? "k" ? "0" [12,] "1" ? ? ? ? "35" ? ? "l" ? "0" [13,] "2" ? ? ? ? "12" ? ? "m" ? "1" [14,] "2" ? ? ? ? "13" ? ? "n" ? "1" [15,] "2" ? ? ? ? "14" ? ? "o" ? "1" [16,] "2" ? ? ? ? "15" ? ? "p" ? "1" [17,] "2" ? ? ? ? "16" ? ? "q" ? "1" [18,] "2" ? ? ? ? "17" ? ? "s" ? "0" [19,] "2" ? ? ? ? "18" ? ? "d" ? "0" [20,] "2" ? ? ? ? "19" ? ? "f" ? "0" Many thanks in advance, Niels -- View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of jim holtman Sent: Wednesday, April 20, 2011 9:59 AM To: baboon2010 Cc: r-help at r-project.org Subject: Re: [R] 'Record' row values every time the binary value in acollumn changes Here is an answer to part 1:
binary<-c(1,1,1,0,0,0,1,1,1,0,0) Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) start<-c(12,17,18,20,25,36,12,15,16,17,19) Table<-cbind(Chromosome,start,binary) # determine where the start/end of each group is # use indices since the size is large startEnd <- lapply(split(seq(nrow(Table))
+ , list(Table[, "Chromosome"], Table[,
'binary'])
+ , drop = TRUE
+ )
+ , function(.indx){
+ se <- range(.indx)
+ c(Chromosome2 = unname(Table[se[1L], "Chromosome"])
+ , position_start = unname(Table[se[1L], 'start'])
+ , position_end = unname(Table[se[2L], 'start'])
+ , binary2 = unname(Table[se[1L], 'binary'])
+ )
+ })
do.call(rbind, startEnd)
Chromosome2 position_start position_end binary2 1.0 1 20 36 0 2.0 2 17 19 0 1.1 1 12 18 1 2.1 2 12 16 1
The following will likely be quicker way to find where
a column changes values than that lapply() when there
are lots of rows:
f1 <- function (Table) {
isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
with(data.frame(Table), {
first <- isFirstInRun(binary)
last <- isLastInRun(binary)
cbind(Chromosome2 = Chromosome[first], position_start = start[first],
position_end = start[last], binary2 = binary[first])
})
}
E.g.,
> f1(Table)
Chromosome2 position_start position_end binary2
[1,] 1 12 18 1
[2,] 1 20 36 0
[3,] 2 12 16 1
[4,] 2 17 19 0
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
My question is twofold. Part 1: My data looks like this: (example set, real data has 2*10^6 rows) binary<-c(1,1,1,0,0,0,1,1,1,0,0) Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) start<-c(12,17,18,20,25,36,12,15,16,17,19) Table<-cbind(Chromosome,start,binary) ? ? ?Chromosome start binary ?[1,] ? ? ? ? ?1 ? ?12 ? ? ?1 ?[2,] ? ? ? ? ?1 ? ?17 ? ? ?1 ?[3,] ? ? ? ? ?1 ? ?18 ? ? ?1 ?[4,] ? ? ? ? ?1 ? ?20 ? ? ?0 ?[5,] ? ? ? ? ?1 ? ?25 ? ? ?0 ?[6,] ? ? ? ? ?1 ? ?36 ? ? ?0 ?[7,] ? ? ? ? ?2 ? ?12 ? ? ?1 ?[8,] ? ? ? ? ?2 ? ?15 ? ? ?1 ?[9,] ? ? ? ? ?2 ? ?16 ? ? ?1 [10,] ? ? ? ? ?2 ? ?17 ? ? ?0 [11,] ? ? ? ? ?2 ? ?19 ? ? ?0 As output I need a shortlist for each binary block: giving
me the starting
and ending position of each block. Which for these example would look like this: ? ? Chromosome2 position_start position_end binary2 [1,] ? ? ? ? ? 1 ? ? ? ? ? ? 12 ? ? ? ? ? 18 ? ? ? 1 [2,] ? ? ? ? ? 1 ? ? ? ? ? ? 20 ? ? ? ? ? 36 ? ? ? 0 [3,] ? ? ? ? ? 2 ? ? ? ? ? ? 12 ? ? ? ? ? 16 ? ? ? 1 [4,] ? ? ? ? ? 2 ? ? ? ? ? ? 17 ? ? ? ? ? 19 ? ? ? 0 Part 2: Based on the output of part 1, I need to assign the binary
to rows of
another data set. If the position value in this second data
set falls in one
of the blocks defined in the shortlist made in part1,the
binary value of the
shortlist should be assigned to an extra column for this
row. ?This would
look something like this: ? ? Chromosome3 position Value binary3 ?[1,] "1" ? ? ? ? "12" ? ? "a" ? "1" ?[2,] "1" ? ? ? ? "13" ? ? "b" ? "1" ?[3,] "1" ? ? ? ? "14" ? ? "c" ? "1" ?[4,] "1" ? ? ? ? "15" ? ? "d" ? "1" ?[5,] "1" ? ? ? ? "16" ? ? "e" ? "1" ?[6,] "1" ? ? ? ? "18" ? ? "f" ? "1" ?[7,] "1" ? ? ? ? "20" ? ? "g" ? "0" ?[8,] "1" ? ? ? ? "21" ? ? "h" ? "0" ?[9,] "1" ? ? ? ? "22" ? ? "i" ? "0" [10,] "1" ? ? ? ? "23" ? ? "j" ? "0" [11,] "1" ? ? ? ? "25" ? ? "k" ? "0" [12,] "1" ? ? ? ? "35" ? ? "l" ? "0" [13,] "2" ? ? ? ? "12" ? ? "m" ? "1" [14,] "2" ? ? ? ? "13" ? ? "n" ? "1" [15,] "2" ? ? ? ? "14" ? ? "o" ? "1" [16,] "2" ? ? ? ? "15" ? ? "p" ? "1" [17,] "2" ? ? ? ? "16" ? ? "q" ? "1" [18,] "2" ? ? ? ? "17" ? ? "s" ? "0" [19,] "2" ? ? ? ? "18" ? ? "d" ? "0" [20,] "2" ? ? ? ? "19" ? ? "f" ? "0" Many thanks in advance, Niels -- View this message in context:
-binary-value-in-a-collumn-changes-tp3462496p3462496.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi:
Here are a couple more options using packages plyr and data.table. The
labels in the second part are changed because they didn't make sense
in a 2M line file (well, mine may not either, but it's a start). You
can always change them to something more pertinent.
# Question 1:
Table <- data.frame(binary, chromosome = Chromosome, start)
library(plyr)
(df <- ddply(Table, .(chromosome, binary), summarise, position_start =
min(start),
position_end = max(start)))
chromosome binary position_start position_end
1 1 0 20 36
2 1 1 12 18
3 2 0 17 19
4 2 1 12 16
library(data.table)
dTable <- data.table(Table, key = 'chromosome, binary')
(dt <- dTable[, list(position_start = min(start),
position_end = max(start)), by = 'chromosome, binary'])
chromosome binary position_start position_end
[1,] 1 0 20 36
[2,] 1 1 12 18
[3,] 2 0 17 19
[4,] 2 1 12 16
## Question 2:
For plyr, it's easy to write a function that takes a generic input data frame
(in this case, a single line) and then outputs a data frame with
positions and labels.
tfun <- function(df) {
diff <- with(df, position_end - position_start + 1)
position <- with(df, seq(position_start, position_end))
value <- paste(df$chromosome, df$binary, letters[1:diff], sep = '.')
data.frame(chromosome = df$chromosome, position, value, binary = df$binary)
}
# Then:
ddply(df, .(chromosome, binary), tfun)
chromosome position value binary
1 1 20 1.0.a 0
2 1 21 1.0.b 0
3 1 22 1.0.c 0
4 1 23 1.0.d 0
5 1 24 1.0.e 0
6 1 25 1.0.f 0
7 1 26 1.0.g 0
8 1 27 1.0.h 0
9 1 28 1.0.i 0
10 1 29 1.0.j 0
11 1 30 1.0.k 0
12 1 31 1.0.l 0
13 1 32 1.0.m 0
14 1 33 1.0.n 0
15 1 34 1.0.o 0
16 1 35 1.0.p 0
17 1 36 1.0.q 0
18 1 12 1.1.a 1
19 1 13 1.1.b 1
20 1 14 1.1.c 1
21 1 15 1.1.d 1
22 1 16 1.1.e 1
23 1 17 1.1.f 1
24 1 18 1.1.g 1
25 2 17 2.0.a 0
26 2 18 2.0.b 0
27 2 19 2.0.c 0
28 2 12 2.1.a 1
29 2 13 2.1.b 1
30 2 14 2.1.c 1
31 2 15 2.1.d 1
32 2 16 2.1.e 1
# For data.table, one can apply the internals of tfun directly:
dt[, list(chromosome = chromosome, position = seq(position_start, position_end),
value = paste(chromosome, binary,
letters[1:(position_end - position_start + 1)],
sep = '.'),
binary = binary), by = 'chromosome, binary']
chromosome binary chromosome.1 position value binary.1
1 0 1 20 1.0.a 0
1 0 1 21 1.0.b 0
1 0 1 22 1.0.c 0
1 0 1 23 1.0.d 0
1 0 1 24 1.0.e 0
1 0 1 25 1.0.f 0
1 0 1 26 1.0.g 0
1 0 1 27 1.0.h 0
1 0 1 28 1.0.i 0
1 0 1 29 1.0.j 0
1 0 1 30 1.0.k 0
1 0 1 31 1.0.l 0
1 0 1 32 1.0.m 0
1 0 1 33 1.0.n 0
1 0 1 34 1.0.o 0
1 0 1 35 1.0.p 0
1 0 1 36 1.0.q 0
1 1 1 12 1.1.a 1
1 1 1 13 1.1.b 1
1 1 1 14 1.1.c 1
1 1 1 15 1.1.d 1
1 1 1 16 1.1.e 1
1 1 1 17 1.1.f 1
1 1 1 18 1.1.g 1
2 0 2 17 2.0.a 0
2 0 2 18 2.0.b 0
2 0 2 19 2.0.c 0
2 1 2 12 2.1.a 1
2 1 2 13 2.1.b 1
2 1 2 14 2.1.c 1
2 1 2 15 2.1.d 1
2 1 2 16 2.1.e 1
cn chromosome binary chromosome position value binary
HTH,
Dennis
On Wed, Apr 20, 2011 at 2:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
My question is twofold. Part 1: My data looks like this: (example set, real data has 2*10^6 rows) binary<-c(1,1,1,0,0,0,1,1,1,0,0) Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) start<-c(12,17,18,20,25,36,12,15,16,17,19) Table<-cbind(Chromosome,start,binary) ? ? ?Chromosome start binary ?[1,] ? ? ? ? ?1 ? ?12 ? ? ?1 ?[2,] ? ? ? ? ?1 ? ?17 ? ? ?1 ?[3,] ? ? ? ? ?1 ? ?18 ? ? ?1 ?[4,] ? ? ? ? ?1 ? ?20 ? ? ?0 ?[5,] ? ? ? ? ?1 ? ?25 ? ? ?0 ?[6,] ? ? ? ? ?1 ? ?36 ? ? ?0 ?[7,] ? ? ? ? ?2 ? ?12 ? ? ?1 ?[8,] ? ? ? ? ?2 ? ?15 ? ? ?1 ?[9,] ? ? ? ? ?2 ? ?16 ? ? ?1 [10,] ? ? ? ? ?2 ? ?17 ? ? ?0 [11,] ? ? ? ? ?2 ? ?19 ? ? ?0 As output I need a shortlist for each binary block: giving me the starting and ending position of each block. Which for these example would look like this: ? ? Chromosome2 position_start position_end binary2 [1,] ? ? ? ? ? 1 ? ? ? ? ? ? 12 ? ? ? ? ? 18 ? ? ? 1 [2,] ? ? ? ? ? 1 ? ? ? ? ? ? 20 ? ? ? ? ? 36 ? ? ? 0 [3,] ? ? ? ? ? 2 ? ? ? ? ? ? 12 ? ? ? ? ? 16 ? ? ? 1 [4,] ? ? ? ? ? 2 ? ? ? ? ? ? 17 ? ? ? ? ? 19 ? ? ? 0 Part 2: Based on the output of part 1, I need to assign the binary to rows of another data set. If the position value in this second data set falls in one of the blocks defined in the shortlist made in part1,the binary value of the shortlist should be assigned to an extra column for this row. ?This would look something like this: ? ? Chromosome3 position Value binary3 ?[1,] "1" ? ? ? ? "12" ? ? "a" ? "1" ?[2,] "1" ? ? ? ? "13" ? ? "b" ? "1" ?[3,] "1" ? ? ? ? "14" ? ? "c" ? "1" ?[4,] "1" ? ? ? ? "15" ? ? "d" ? "1" ?[5,] "1" ? ? ? ? "16" ? ? "e" ? "1" ?[6,] "1" ? ? ? ? "18" ? ? "f" ? "1" ?[7,] "1" ? ? ? ? "20" ? ? "g" ? "0" ?[8,] "1" ? ? ? ? "21" ? ? "h" ? "0" ?[9,] "1" ? ? ? ? "22" ? ? "i" ? "0" [10,] "1" ? ? ? ? "23" ? ? "j" ? "0" [11,] "1" ? ? ? ? "25" ? ? "k" ? "0" [12,] "1" ? ? ? ? "35" ? ? "l" ? "0" [13,] "2" ? ? ? ? "12" ? ? "m" ? "1" [14,] "2" ? ? ? ? "13" ? ? "n" ? "1" [15,] "2" ? ? ? ? "14" ? ? "o" ? "1" [16,] "2" ? ? ? ? "15" ? ? "p" ? "1" [17,] "2" ? ? ? ? "16" ? ? "q" ? "1" [18,] "2" ? ? ? ? "17" ? ? "s" ? "0" [19,] "2" ? ? ? ? "18" ? ? "d" ? "0" [20,] "2" ? ? ? ? "19" ? ? "f" ? "0" Many thanks in advance, Niels -- View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Here's one way to do part 1:
rr = rle(Table[,'binary']) cc = cumsum(rr$lengths)+1 thestarts = c(1,cc[cc<=nrow(Table)]) theends = cc-1 answer = cbind(Table[thestarts,'Chromosome'],Table[thestarts,'start'],Table[theends,'start'],rr$values) answer
[,1] [,2] [,3] [,4] [1,] 1 12 18 1 [2,] 1 20 36 0 [3,] 2 12 16 1 [4,] 2 17 19 0 If I understand you correctly, here's a way to do part 2:
Next = matrix(c(rep(1,12),rep(2,8),c(12,13,14,15,16,18,20,21,22,23,25,35,12,13,14,15,16,17,18,19)),ncol=2) apply(Next,1,function(x)answer[answer[,1]==x[1] & x[2] >= answer[,2] & x[2] <= answer[,3],4])
[1] 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu
On Wed, Apr 20, 2011 at 5:01 AM, baboon2010 <nielsvanderaa at live.be> wrote:
My question is twofold. Part 1: My data looks like this: (example set, real data has 2*10^6 rows) binary<-c(1,1,1,0,0,0,1,1,1,0,0) Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2) start<-c(12,17,18,20,25,36,12,15,16,17,19) Table<-cbind(Chromosome,start,binary) ? ? ?Chromosome start binary ?[1,] ? ? ? ? ?1 ? ?12 ? ? ?1 ?[2,] ? ? ? ? ?1 ? ?17 ? ? ?1 ?[3,] ? ? ? ? ?1 ? ?18 ? ? ?1 ?[4,] ? ? ? ? ?1 ? ?20 ? ? ?0 ?[5,] ? ? ? ? ?1 ? ?25 ? ? ?0 ?[6,] ? ? ? ? ?1 ? ?36 ? ? ?0 ?[7,] ? ? ? ? ?2 ? ?12 ? ? ?1 ?[8,] ? ? ? ? ?2 ? ?15 ? ? ?1 ?[9,] ? ? ? ? ?2 ? ?16 ? ? ?1 [10,] ? ? ? ? ?2 ? ?17 ? ? ?0 [11,] ? ? ? ? ?2 ? ?19 ? ? ?0 As output I need a shortlist for each binary block: giving me the starting and ending position of each block. Which for these example would look like this: ? ? Chromosome2 position_start position_end binary2 [1,] ? ? ? ? ? 1 ? ? ? ? ? ? 12 ? ? ? ? ? 18 ? ? ? 1 [2,] ? ? ? ? ? 1 ? ? ? ? ? ? 20 ? ? ? ? ? 36 ? ? ? 0 [3,] ? ? ? ? ? 2 ? ? ? ? ? ? 12 ? ? ? ? ? 16 ? ? ? 1 [4,] ? ? ? ? ? 2 ? ? ? ? ? ? 17 ? ? ? ? ? 19 ? ? ? 0 Part 2: Based on the output of part 1, I need to assign the binary to rows of another data set. If the position value in this second data set falls in one of the blocks defined in the shortlist made in part1,the binary value of the shortlist should be assigned to an extra column for this row. ?This would look something like this: ? ? Chromosome3 position Value binary3 ?[1,] "1" ? ? ? ? "12" ? ? "a" ? "1" ?[2,] "1" ? ? ? ? "13" ? ? "b" ? "1" ?[3,] "1" ? ? ? ? "14" ? ? "c" ? "1" ?[4,] "1" ? ? ? ? "15" ? ? "d" ? "1" ?[5,] "1" ? ? ? ? "16" ? ? "e" ? "1" ?[6,] "1" ? ? ? ? "18" ? ? "f" ? "1" ?[7,] "1" ? ? ? ? "20" ? ? "g" ? "0" ?[8,] "1" ? ? ? ? "21" ? ? "h" ? "0" ?[9,] "1" ? ? ? ? "22" ? ? "i" ? "0" [10,] "1" ? ? ? ? "23" ? ? "j" ? "0" [11,] "1" ? ? ? ? "25" ? ? "k" ? "0" [12,] "1" ? ? ? ? "35" ? ? "l" ? "0" [13,] "2" ? ? ? ? "12" ? ? "m" ? "1" [14,] "2" ? ? ? ? "13" ? ? "n" ? "1" [15,] "2" ? ? ? ? "14" ? ? "o" ? "1" [16,] "2" ? ? ? ? "15" ? ? "p" ? "1" [17,] "2" ? ? ? ? "16" ? ? "q" ? "1" [18,] "2" ? ? ? ? "17" ? ? "s" ? "0" [19,] "2" ? ? ? ? "18" ? ? "d" ? "0" [20,] "2" ? ? ? ? "19" ? ? "f" ? "0" Many thanks in advance, Niels -- View this message in context: http://r.789695.n4.nabble.com/Record-row-values-every-time-the-binary-value-in-a-collumn-changes-tp3462496p3462496.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.