Hi,
I have two big data set.
data _1 :
Chromosome Start End Feature GroupA_3
1: chr1 521369 750000 chr1-0001 0.170
2: chr1 750001 800000 chr1-0002 -0.086
3: chr1 800001 850000 chr1-0003 0.006
4: chr1 850001 900000 chr1-0004 0.050
5: chr1 900001 950000 chr1-0005 0.062
6: chr1 950001 1000000 chr1-0006 -0.016
data_2:
Chromosome Start End Feature GroupA_3
1: chr1 15864 15865 cg13869341 0.207
2: chr1 18826 18827 cg14008030 -0.288
3: chr1 29406 29407 cg12045430 -0.331
4: chr1 29424 29425 cg20826792 -0.074
5: chr1 29434 29435 cg00381604 0.141
6: chr1 68848 68849 cg20253340 -0.458
What I want to do :
Based on column name "Chromosome", "Start" and "End" of two data set , I
want to find which row (preciously "Feature") of data_2 is in every range (
between "Start" and "End") of data_1 ? Also "Chromosome" column element
should be match between two data set.
I have tried "GenomicRanges" packages describe in the post
https://stackoverflow.com/questions/11892241/merge-by-
range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as
the data is very big ?
Thanks in advance.
Regards.............
Tanvir Ahamed Stockholm, Sweden | mashranga at yahoo.com
[[alternative HTML version deleted]]