Back to formatted view
Raw Message

Message-ID: <CAM_vjumF6mKOoxyOZfNosq4jWrgDY0ZN3+V2ocCvs4BFMceypA@mail.gmail.com>
Date: 2011-11-11T23:05:27Z
From: Sarah Goslee
Subject: Combining Overlapping Data
In-Reply-To: <1321045623076-4032719.post@n4.nabble.com>

What about merge() with all=FALSE?

> x <- data.frame(a=letters[1:6], b=1:6)
> y <- data.frame(a=letters[4:9], b=11:16)
> x
  a b
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
> y
  a  b
1 d 11
2 e 12
3 f 13
4 g 14
5 h 15
6 i 16
> merge(x, y, by="a", all=FALSE)
  a b.x b.y
1 d   4  11
2 e   5  12
3 f   6  13
>

If that doesn't work, some sample data would be useful.

Sarah

On Fri, Nov 11, 2011 at 4:07 PM, kickout <kyle.kocak at gmail.com> wrote:
> I've scoured the archives but have found no concrete answer to my question.
>
> Problem: Two data sets
>
> 1st data set(x) = 20,000 rows
> 2nd data set(y) = 5,000 rows
>
> Both have the same column names, the column of interest to me is a variable
> called strain.
>
> For example, a strain named "Chab1405" appears in x 150 times and in y 25
> times...
> strain "Chab1999" only appears 200 times in x and none in y (so i dont want
> that retained).
>
>
> I want to create a new data frame that has all 175 measurements for
> "Chab1405" and any other 'strain' that appears in both the two data sets..
> but not strains that appear in only one data set...So i want the
> intersection of two data sets (maybe?).
>
> I've tried x %in% y, but that only gives TRUE/FALSE
>

-- 
Sarah Goslee
http://www.functionaldiversity.org