Merge by Range in R

Hi,?
I have two big data set.?

data _1 :?
dim(data_1)
[1] 15820 5
head(data_1)
? ?Chromosome ?????Start????????End????????Feature GroupA_3
1: ? ? ? ????????chr1 521369 ?750000 ????chr1-0001 ? ?????0.170
2: ? ? ? ????????chr1 750001 ?800000 ????chr1-0002 ? ????-0.086
3: ? ? ? ????????chr1 800001 ?850000 ????chr1-0003 ? ?????0.006
4: ? ? ? ????????chr1 850001 ?900000 ????chr1-0004 ? ?????0.050
5: ? ? ? ????????chr1 900001 ?950000 ????chr1-0005 ? ?????0.062
6: ? ? ? ????????chr1 950001 1000000 ? ?chr1-0006 ? ????-0.016

data_2:
dim(data_2)
[1] 470870 5
head(data_2)
? ?Chromosome ????Start ? End????????????Feature ????GroupA_3
1: ? ? ? ????????chr1 15864 15865 ????cg13869341 ? ?????????0.207
2: ? ? ? ????????chr1 18826 18827 ????cg14008030 ? ????????-0.288
3: ? ? ? ????????chr1 29406 29407 ????cg12045430 ? ????????-0.331
4: ? ? ? ????????chr1 29424 29425 ????cg20826792 ? ????????-0.074
5: ? ? ? ????????chr1 29434 29435 ????cg00381604 ? ?????????0.141
6: ? ? ? ????????chr1 68848 68849 ????cg20253340 ? ????????-0.458

What I want to do :?
Based on column name "Chromosome", "Start" and "End" of two data set , ? I want to find which row (preciously "Feature") of data_2 is in every range ( between "Start" and "End") of data_1 ? Also "Chromosome" column element should be match between two data set.?

I have tried "GenomicRanges" packages describe in the post ?
https://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as the data is very big ??
Thanks in advance.

Regards.............
Tanvir Ahamed Stockholm, Sweden???? |??mashranga at yahoo.com
Have you tried 'foverlaps' in the data.table package?

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Sep 4, 2017 at 8:31 AM, Mohammad Tanvir Ahamed via R-help <

Hi,
I have two big data set.

data _1 :
dim(data_1)
[1] 15820 5

head(data_1)
   Chromosome      Start        End        Feature GroupA_3
1:               chr1 521369  750000     chr1-0001        0.170
2:               chr1 750001  800000     chr1-0002       -0.086
3:               chr1 800001  850000     chr1-0003        0.006
4:               chr1 850001  900000     chr1-0004        0.050
5:               chr1 900001  950000     chr1-0005        0.062
6:               chr1 950001 1000000    chr1-0006       -0.016

data_2:
dim(data_2)
[1] 470870 5

head(data_2)
   Chromosome     Start   End            Feature     GroupA_3
1:               chr1 15864 15865     cg13869341            0.207
2:               chr1 18826 18827     cg14008030           -0.288
3:               chr1 29406 29407     cg12045430           -0.331
4:               chr1 29424 29425     cg20826792           -0.074
5:               chr1 29434 29435     cg00381604            0.141
6:               chr1 68848 68849     cg20253340           -0.458

What I want to do :
Based on column name "Chromosome", "Start" and "End" of two data set ,   I
want to find which row (preciously "Feature") of data_2 is in every range (
between "Start" and "End") of data_1 ? Also "Chromosome" column element
should be match between two data set.

I have tried "GenomicRanges" packages describe in the post
https://stackoverflow.com/questions/11892241/merge-by-
range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as
the data is very big ?
Thanks in advance.

Regards.............
Tanvir Ahamed Stockholm, Sweden     |  mashranga at yahoo.com

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/
posting-guide.html
and provide commented, minimal, self-contained, reproducible code.