Skip to content

two difficult loop

2 messages · greg holly, Jim Lemon

#
Hi Bert;

I do appreciate for this. I need check your codes on task2 tomorrow at my
office on the real data as I have difficulty (because a technical issue) to
remote connection. I am sure it will work well.

I am sorry that I was not able to explain my first question. Basically

Values in ref data represent the region of chromosome. I need choose these
regions in map (all regions values in ref data are exist in map data in the
first column -column map$reg). And then summing up the column "map$rate and
count the numbers that gives >0.85. For example, consider  the first row in
data ref. They are 29220   and  63933. After sorting the first column in
map then summing column "map$rate" only between 29220   to  63933 in sorted
map and cut off at >0.85. Then count how many rows in sorted map gives
map$reg and only summing first 12 of them  gives>0.85. Then my answer is
going to be 12 for 29220   -  63933 in ref.

Thanks I lot for your patience.

Cheers,
Greg
On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com> wrote:

            

  
  
#
Hi Greg,
Okay, I have a better idea now of what you want. The problem of
multiple matches is still there, but here is a start:

# this data frame actually contains all the values in ref in the "reg" field
map<-read.table(text="reg p rate
 10276 0.700  3.867e-18
 71608 0.830  4.542e-16
 29220 0.430  1.948e-15
 99542 0.220  1.084e-15
 26441 0.880  9.675e-14
 95082 0.090  7.349e-13
 36169 0.480  9.715e-13
 55572 0.500  9.071e-12
 65255 0.300  1.688e-11
 51960 0.970  1.163e-10
 55652 0.388  3.750e-10
 63933 0.250  9.128e-10
 35170 0.720  7.355e-09
 06491 0.370  1.634e-08
 85508 0.470  1.057e-07
 86666 0.580  7.862e-07
 04758 0.810  9.501e-07
 06169 0.440  1.104e-06
 63933 0.750  2.624e-06
 41838 0.960  8.119e-06
 74806 0.810  9.501e-07
 92643 0.470  1.057e-07
 73732 0.090  7.349e-13
 82451 0.960  8.119e-06
 86042 0.480  9.715e-13
 93502 0.500  9.071e-12
 85508 0.370  1.634e-08
 95082 0.830  4.542e-16",
 header=TRUE)
# same as in your example
ref<-read.table(text="reg1 reg2
 29220     63933
 26441     41838
 06169     10276
 74806     92643
 73732     82451
 86042     93502
 85508     95082",
 header=TRUE)
# sort the "map" data frame
map2<-map[order(map$reg),]
# get a field for the counts
ref$n<-NA
# and a field for the minimum p values
ref$min_p<-NA
# get the number of rows in "ref"
nref<-dim(ref)[1]
for(i in 1:nref) {
 start<-which(map2$reg==ref$reg1[i])
 end<-which(map2$reg==ref$reg2[i])
 cat("start",start,"end",end,"\n")
 # get the range of matches
 regrange<-range(c(start,end))
 # convert this to a sequence spanning all matches
 allreg<-regrange[1]:regrange[2]
 ref$n[i]<-sum(map2$p[allreg] > 0.85)
 ref$min_p[i]<-min(map2$p[allreg])
}

This example uses the span from the first match of "reg1" to the last
match of "reg2". This may not be what you want, so let me know if
there are further constraints.

Jim
On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hholly at gmail.com> wrote: