two difficult loop
Hi Greg, You've got a problem that you don't seem to have identified. Your "reg" field in the "map" data frame can define at most 100000 unique values. This means that each value will be repeated about 270 times. Unless there are constraints you haven't mentioned, we would expect that in 135 cases for each value, the values in each "ref" row will be in the reverse order and the spans may overlap. I notice that you may have tried to get around this by sorting the "map" data frame, but then the order of the rows is different, and the number of rows "between" any two values changes. Apart from this, it is almost certain that the number of values of "p > 0.85" in the multiple runs between each set of "ref" values will be different. It is possible to perform both tasks that you mention, but only the second will yield an unique or tied value for all of the cases. So your result data frame will have an unspecified number of values for each row in "ref" for the first task. Jim
On Mon, Jun 13, 2016 at 6:14 AM, greg holly <mak.hholly at gmail.com> wrote:
Dear all;
I have two data sets, data=map and data=ref). A small part of each data set
are given below. Data map has more than 27 million and data ref has about
560 rows. Basically I need run two different task. My R codes for these
task are given below but they do not work properly.
I sincerely do appreciate your helps.
Regards,
Greg
Task 1)
For example, the first and second columns for row 1 in data ref are 29220
63933. So I need write an R code normally first look the first row in ref
(which they are 29220 and 63933) than summing the column of "map$rate" and
give the number of rows that >0.85. Then do the same for the second,
third....in ref. At the end I would like a table gave below (the results I
need). Please notice the all value specified in ref data file are exist in
map$reg column.
Task2)
Again example, the first and second columns for row 1 in data ref are 29220
63933. So I need write an R code give the minimum map$p for the 29220
-63933 intervals in map file. Than
do the same for the second, third....in ref.
#my attempt for the first question
temp<-map[order(map$reg, map$p),]
count<-1
temp<-unique(temp$reg
for(i in 1:length(ref) {
for(j in 1:length(ref)
{
temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]
& temp[cumsum(temp$rate)
0.70,])
count=count+1
}
}
#my attempt for the second question
temp<-map[order(map$reg, map$p),]
count<-1
temp<-unique(temp$reg
for(i in 1:length(ref) {
for(j in 1:length(ref)
{
temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,])
output<-temp2[temp2$p==min(temp2$p),]
}
}
Data sets
Data= map
reg p rate
10276 0.700 3.867e-18
71608 0.830 4.542e-16
29220 0.430 1.948e-15
99542 0.220 1.084e-15
26441 0.880 9.675e-14
95082 0.090 7.349e-13
36169 0.480 9.715e-13
55572 0.500 9.071e-12
65255 0.300 1.688e-11
51960 0.970 1.163e-10
55652 0.388 3.750e-10
63933 0.250 9.128e-10
35170 0.720 7.355e-09
06491 0.370 1.634e-08
85508 0.470 1.057e-07
86666 0.580 7.862e-07
04758 0.810 9.501e-07
06169 0.440 1.104e-06
63933 0.750 2.624e-06
41838 0.960 8.119e-06
data=ref
reg1 reg2
29220 63933
26441 41838
06169 10276
74806 92643
73732 82451
86042 93502
85508 95082
the results I need
reg1 reg2 n
29220 63933 12
26441 41838 78
06169 10276 125
74806 92643 11
73732 82451 47
86042 93502 98
85508 95082 219
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.