Message-ID: <CAM_vjumF6mKOoxyOZfNosq4jWrgDY0ZN3+V2ocCvs4BFMceypA@mail.gmail.com>
Date: 2011-11-11T23:05:27Z
From: Sarah Goslee
Subject: Combining Overlapping Data
In-Reply-To: <1321045623076-4032719.post@n4.nabble.com>
What about merge() with all=FALSE?
> x <- data.frame(a=letters[1:6], b=1:6)
> y <- data.frame(a=letters[4:9], b=11:16)
> x
a b
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
> y
a b
1 d 11
2 e 12
3 f 13
4 g 14
5 h 15
6 i 16
> merge(x, y, by="a", all=FALSE)
a b.x b.y
1 d 4 11
2 e 5 12
3 f 6 13
>
If that doesn't work, some sample data would be useful.
Sarah
On Fri, Nov 11, 2011 at 4:07 PM, kickout <kyle.kocak at gmail.com> wrote:
> I've scoured the archives but have found no concrete answer to my question.
>
> Problem: Two data sets
>
> 1st data set(x) = 20,000 rows
> 2nd data set(y) = 5,000 rows
>
> Both have the same column names, the column of interest to me is a variable
> called strain.
>
> For example, a strain named "Chab1405" appears in x 150 times and in y 25
> times...
> strain "Chab1999" only appears 200 times in x and none in y (so i dont want
> that retained).
>
>
> I want to create a new data frame that has all 175 measurements for
> "Chab1405" and any other 'strain' that appears in both the two data sets..
> but not strains that appear in only one data set...So i want the
> intersection of two data sets (maybe?).
>
> I've tried x %in% y, but that only gives TRUE/FALSE
>
--
Sarah Goslee
http://www.functionaldiversity.org