Dear all, I have 2 data frames, both with 14 columns of data and differing numbers of rows. The first two columns are 'Latitude' and 'Longitude'. I want to find the pairs of Latitude and Longitude coordinates which are common to both datasets, and output a new data frame which is composed of these coincident rows. I tried using the 'unique' command, but had difficulties interpreting the help file. Many thanks for any help offered, Steve
Finding rows common to two datasets
6 messages · Steve Murray, Umesh Srinivasan, jim holtman +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090428/a8b8941f/attachment-0001.pl>
Thanks for the reply, however, when I do the following command, I receive the message: 'data frame with 0 columns and 0 rows'. I've checked again though, and there should be several thousand rows where the Latitude and Longitude pairs are the same.
common <- intersect(data_frame_x[c("Latitude", "Longitude")], data_frame_y[c("Latitude","Longitude")])
common
data frame with 0 columns and 0 rows Is there an obvious solution to this? Should I be using 'unique' instead, and if so, how would I get the above to correspond to this command? Thanks, Steve ________________________________
Date: Tue, 28 Apr 2009 13:36:51 +0530 Subject: Re: [R] Finding rows common to two datasets From: umesh.srinivasan at gmail.com To: smurray444 at hotmail.com CC: r-help at r-project.org Dear Steve, Try ? intersect and see if that might help. Cheers, Umesh On Tue, Apr 28, 2009 at 1:29 PM, Steve Murray> wrote: Dear all, I have 2 data frames, both with 14 columns of data and differing numbers of rows. The first two columns are 'Latitude' and 'Longitude'. I want to find the pairs of Latitude and Longitude coordinates which are common to both datasets, and output a new data frame which is composed of these coincident rows. I tried using the 'unique' command, but had difficulties interpreting the help file. Many thanks for any help offered, Steve
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090428/3c5df535/attachment-0001.pl>
You are missing a comma:
common <- intersect(data_frame_x[,c("Latitude", "Longitude")],
data_frame_y[,c("Latitude","Longitude")])
On Tue, Apr 28, 2009 at 5:49 AM, Steve Murray <smurray444 at hotmail.com> wrote:
Thanks for the reply, however, when I do the following command, I receive the message: 'data frame with 0 columns and 0 rows'. I've checked again though, and there should be several thousand rows where the Latitude and Longitude pairs are the same.
common <- intersect(data_frame_x[c("Latitude", "Longitude")], data_frame_y[c("Latitude","Longitude")])
common
data frame with 0 columns and 0 rows Is there an obvious solution to this? Should I be using 'unique' instead, and if so, how would I get the above to correspond to this command? Thanks, Steve
________________________________ Date: Tue, 28 Apr 2009 13:36:51 +0530 Subject: Re: [R] Finding rows common to two datasets From: umesh.srinivasan at gmail.com To: smurray444 at hotmail.com CC: r-help at r-project.org Dear Steve, Try ? intersect and see if that might help. Cheers, Umesh On Tue, Apr 28, 2009 at 1:29 PM, Steve Murray> wrote: Dear all, I have 2 data frames, both with 14 columns of data and differing numbers of rows. The first two columns are 'Latitude' and 'Longitude'. I want to find the pairs of Latitude and Longitude coordinates which are common to both datasets, and output a new data frame which is composed of these coincident rows. I tried using the 'unique' command, but had difficulties interpreting the help file. Many thanks for any help offered, Steve ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
I think merge() can do what's wanted, but you do have to be careful that values match exactly. Here's an example where two data frames print the same in a row for columns 'a' and 'b', but are not exactly same. merge() returns zero rows. This problem can be fixed in this case by rounding, but that's not a good general solution because very close numbers can round to different numbers, e.g., 1.499 and 1.501. Here are examples:
x <- data.frame(a=c(1.0000001,2), b=c(3,4), c=LETTERS[1:2]) y <- data.frame(a=c(1,2), b=c(3,5), c=LETTERS[3:4]) x
a b c 1 1 3 A 2 2 4 B
y
a b c 1 1 3 C 2 2 5 D
# x[1,"a"] and y[1,"a"] look the same, but are very slightly different
merge(x, y, by=c("a", "b"))
[1] a b c.x c.y <0 rows> (or 0-length row.names)
# make x1 a version of x where the values are rounded to whole numbers
x1 <- x
x1$a <- round(x1$a)
merge(x1, y, by=c("a", "b"))
a b c.x c.y 1 1 3 A C
# intersect() returns columns that are the same in each dataframe, not rows intersect(x, y)
c 1 C 2 D
intersect(x1, y)
a c 1 1 C 2 2 D
-- Tony Plate
jim holtman wrote:
You are missing a comma:
common <- intersect(data_frame_x[,c("Latitude", "Longitude")],
data_frame_y[,c("Latitude","Longitude")])
On Tue, Apr 28, 2009 at 5:49 AM, Steve Murray <smurray444 at hotmail.com> wrote:
Thanks for the reply, however, when I do the following command, I receive the message: 'data frame with 0 columns and 0 rows'. I've checked again though, and there should be several thousand rows where the Latitude and Longitude pairs are the same.
common <- intersect(data_frame_x[c("Latitude", "Longitude")], data_frame_y[c("Latitude","Longitude")])
common
data frame with 0 columns and 0 rows Is there an obvious solution to this? Should I be using 'unique' instead, and if so, how would I get the above to correspond to this command? Thanks, Steve
________________________________ Date: Tue, 28 Apr 2009 13:36:51 +0530 Subject: Re: [R] Finding rows common to two datasets From: umesh.srinivasan at gmail.com To: smurray444 at hotmail.com CC: r-help at r-project.org Dear Steve, Try ? intersect and see if that might help. Cheers, Umesh On Tue, Apr 28, 2009 at 1:29 PM, Steve Murray> wrote: Dear all, I have 2 data frames, both with 14 columns of data and differing numbers of rows. The first two columns are 'Latitude' and 'Longitude'. I want to find the pairs of Latitude and Longitude coordinates which are common to both datasets, and output a new data frame which is composed of these coincident rows. I tried using the 'unique' command, but had difficulties interpreting the help file. Many thanks for any help offered, Steve ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.