Skip to content

Merge by Range in R

2 messages · Mohammad Tanvir Ahamed, jim holtman

#
Hi,?
I have two big data set.?

data _1 :?
[1] 15820 5
? ?Chromosome ?????Start????????End????????Feature GroupA_3
1: ? ? ? ????????chr1 521369 ?750000 ????chr1-0001 ? ?????0.170
2: ? ? ? ????????chr1 750001 ?800000 ????chr1-0002 ? ????-0.086
3: ? ? ? ????????chr1 800001 ?850000 ????chr1-0003 ? ?????0.006
4: ? ? ? ????????chr1 850001 ?900000 ????chr1-0004 ? ?????0.050
5: ? ? ? ????????chr1 900001 ?950000 ????chr1-0005 ? ?????0.062
6: ? ? ? ????????chr1 950001 1000000 ? ?chr1-0006 ? ????-0.016

data_2:
[1] 470870 5
? ?Chromosome ????Start ? End????????????Feature ????GroupA_3
1: ? ? ? ????????chr1 15864 15865 ????cg13869341 ? ?????????0.207
2: ? ? ? ????????chr1 18826 18827 ????cg14008030 ? ????????-0.288
3: ? ? ? ????????chr1 29406 29407 ????cg12045430 ? ????????-0.331
4: ? ? ? ????????chr1 29424 29425 ????cg20826792 ? ????????-0.074
5: ? ? ? ????????chr1 29434 29435 ????cg00381604 ? ?????????0.141
6: ? ? ? ????????chr1 68848 68849 ????cg20253340 ? ????????-0.458


What I want to do :?
Based on column name "Chromosome", "Start" and "End" of two data set , ? I want to find which row (preciously "Feature") of data_2 is in every range ( between "Start" and "End") of data_1 ? Also "Chromosome" column element should be match between two data set.?

I have tried "GenomicRanges" packages describe in the post ?
https://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as the data is very big ??
Thanks in advance.


Regards.............
Tanvir Ahamed Stockholm, Sweden???? |??mashranga at yahoo.com
#
Have you tried 'foverlaps' in the data.table package?


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Sep 4, 2017 at 8:31 AM, Mohammad Tanvir Ahamed via R-help <
r-help at r-project.org> wrote: