Skip to content

Combining data.frames

13 messages · Jeff Reichman, Tom Woolman, Jeff Newmiller +2 more

#
R-Help Community

I'm trying to combine two data.frames which each containing 10 columns of
which they each share two common fields. Here are two small test datasets.

df1 <- data.frame(date =
c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",
 
"2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3"),
                  geo_hash =
c("abc123","abc123","abc456","abc789","abc246","abc123",
                               "asd123","abc789","abc890","abc123"),
                  ad_id = c("a12345","b12345","a12345","a12345","c12345",
                            "b12345","b12345","a12345","b12345","a12345"))
df2 <- data.frame(date =
c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3"),
                  geo_hash =
c("abc123","abc456","abc123","abc789","abc890"),
                  event = c("shoting","ied","protest","riot","protest"))

I'm trying to combine them such that I get a combined data.frames such as 

date		geo_hash	ad_id		event
1/1/2021	abc123		a12345		shoting
1/1/2021	abc123		b12345	
1/1/2021	abc456		a12345		ied
1/1/2021	abc789		a12345	
1/1/2021	abc246		c12345	

Jeff
#
Have you looked at the merge function in base R?

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge
On 2022-03-19 21:15, Jeff Reichman wrote:
#
Evening Tom

Yest I've been playing with the merge function.  But haven't been able to
achieve what I need. Could maybe the way to to and it might be my syntax

-----Original Message-----
From: Tom Woolman <twoolman at ontargettek.com> 
Sent: Saturday, March 19, 2022 8:20 PM
To: reichmanj at sbcglobal.net
Cc: r-help at r-project.org
Subject: Re: [R] Combining data.frames

Have you looked at the merge function in base R?

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge
On 2022-03-19 21:15, Jeff Reichman wrote:
#
You can also do "SQL-like" joins in the tidyverse with dplyr.
On 2022-03-19 21:23, Jeff Reichman wrote:
#
Yes I'm reading that presently

The closest I've gotten has been

df3 <- merge(df1, df2, all = TRUE)

-----Original Message-----
From: Tom Woolman <twoolman at ontargettek.com> 
Sent: Saturday, March 19, 2022 8:27 PM
To: reichmanj at sbcglobal.net
Cc: r-help at r-project.org
Subject: Re: [R] Combining data.frames

You can also do "SQL-like" joins in the tidyverse with dplyr.
On 2022-03-19 21:23, Jeff Reichman wrote:
#
Tom

Looks like I figured it out. Syntax issue - wrong "all" argument  (I think)

-----Original Message-----
From: Tom Woolman <twoolman at ontargettek.com> 
Sent: Saturday, March 19, 2022 8:27 PM
To: reichmanj at sbcglobal.net
Cc: r-help at r-project.org
Subject: Re: [R] Combining data.frames

You can also do "SQL-like" joins in the tidyverse with dplyr.
On 2022-03-19 21:23, Jeff Reichman wrote:
#
I'm trying hard to take tonight off and avoid booting up the laptop and 
launching R... :)   but you need to merge by the primary key(s), e.g. 
the common columns (common IVs) shared between the two dataframes.
On 2022-03-19 21:38, Jeff Reichman wrote:
#
Then show your code so we can focus on what you haven't yet figured out. Have you read the examples in the merge help page?
On March 19, 2022 6:23:02 PM PDT, Jeff Reichman <reichmanj at sbcglobal.net> wrote:

  
    
#
by = c("date", "geo_hash" )
On March 19, 2022 6:31:19 PM PDT, Jeff Reichman <reichmanj at sbcglobal.net> wrote:

  
    
#
Jeff

This seems to work 

df3 <- merge(df1, df2, all = TRUE)

When I use either of the by.x, by.y or all.x, all.y arguments  I get really weard results.  Simply using the code about appears to work thus far.

-----Original Message-----
From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> 
Sent: Saturday, March 19, 2022 8:51 PM
To: reichmanj at sbcglobal.net; Jeff Reichman <reichmanj at sbcglobal.net>; 'Tom Woolman' <twoolman at ontargettek.com>
Cc: r-help at r-project.org
Subject: Re: [R] Combining data.frames

Then show your code so we can focus on what you haven't yet figured out. Have you read the examples in the merge help page?
On March 19, 2022 6:23:02 PM PDT, Jeff Reichman <reichmanj at sbcglobal.net> wrote:
--
Sent from my phone. Please excuse my brevity.
#
Ok this seems to work correctly

df1 <- data.frame(date = as.factor(c("2021-1-1","2021-1-1","2021-1-1","2021-1-1","2021-1-1",
                           "2021-1-2","2021-1-2","2021-1-3","2021-1-3","2021-1-3",
                           "2021-1-4")),
                  geo_hash = as.factor(c("abc123","abc123","abc456","abc789","abc246","abc123",
                               "asd123","abc789","abc890","abc123","z12345")),
                  ad_id = as.factor(c("a12345","b12345","a12345","a12345","c12345",
                            "b12345","b12345","a12345","b12345","a12345","a12345")))
df2 <- data.frame(date = as.factor(c("2021-1-1","2021-1-1","2021-1-2","2021-1-3","2021-1-3","2021-1-4")),
                  geo_hash = as.factor(c("abc123","abc456","abc123","abc789","abc890","w12345")),
                  event = as.factor(c("shoting","ied","protest","riot","protest","killing")))

df1
df2

#df3 <- merge(df1, df2, all = TRUE)
df3 <- merge(df1, df2, by = c("date", "geo_hash" ), all = TRUE)
df3

-----Original Message-----
From: Jeff Newmiller <jdnewmil at dcn.davis.ca.us> 
Sent: Saturday, March 19, 2022 8:55 PM
To: reichmanj at sbcglobal.net; Jeff Reichman <reichmanj at sbcglobal.net>; 'Tom Woolman' <twoolman at ontargettek.com>
Cc: r-help at r-project.org
Subject: Re: [R] Combining data.frames

by = c("date", "geo_hash" )
On March 19, 2022 6:31:19 PM PDT, Jeff Reichman <reichmanj at sbcglobal.net> wrote:
--
Sent from my phone. Please excuse my brevity.
#
Merge by the common keys/column names is the default. Te question is likely
what to do with rows that don't  match.  That's  determined by 'all'
settings, which the OP may already have figured out.
On Sat, Mar 19, 2022, 7:16 PM Tom Woolman <twoolman at ontargettek.com> wrote:

            

  
  
#
Hello,

The two merge below give identical results.
Maybe there was something in your R session?


df3 <- merge(df1, df2, by = c("date", "geo_hash" ), all = TRUE)
df3b <- merge(df1, df2, all = TRUE)
identical(df3, df3b)
#[1] TRUE

Hope this helps,

Rui Barradas

?s 02:05 de 20/03/2022, Jeff Reichman escreveu: