Message-ID: <AANLkTikyET6Vx=QeYiJ9rhvjhVj2RuWvYDJTzu_zXwH=@mail.gmail.com>
Date: 2010-12-07T17:11:47Z
From: Gabor Grothendieck
Subject: Creating binary variable depending on strings of two dataframes
In-Reply-To: <1291739453670-3076724.post@n4.nabble.com>
On Tue, Dec 7, 2010 at 11:30 AM, Pete Pete <noxyport at gmail.com> wrote:
>
> Hi,
> consider the following two dataframes:
> x1=c("232","3454","3455","342","13")
> x2=c("1","1","1","0","0")
> data1=data.frame(x1,x2)
>
> y1=c("232","232","3454","3454","3455","342","13","13","13","13")
> y2=c("E1","F3","F5","E1","E2","H4","F8","G3","E1","H2")
> data2=data.frame(y1,y2)
>
> I need a new column in dataframe data1 (x3), which is either 0 or 1
> depending if the value "E1" in y2 of data2 is true while x1=y1. The result
> of data1 should look like this:
> ? x1 ? ? x2 x3
> 1 232 ? 1 ? 1
> 2 3454 1 ? 1
> 3 3455 1 ? 0
> 4 342 ? 0 ? 0
> 5 13 ? ? 0 ? 1
>
> I think a SQL command could help me but I am too inexperienced with it to
> get there.
>
Try this:
> library(sqldf)
> sqldf("select x1, x2, max(y2 = 'E1') x3 from data1 d1 left join data2 d2 on (x1 = y1) group by x1, x2 order by d1.rowid")
x1 x2 x3
1 232 1 1
2 3454 1 1
3 3455 1 0
4 342 0 0
5 13 0 1
--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com