Skip to content

Merging dataframes

3 messages · Chintanu, Dr. Robin Haunschild, Thierry Onkelinx

#
Thanks - Peter, Eivind, Rui


Sorry, I perhaps could not explain it properly in the first go.

Trying to simplify it here with an example - Say I have two dataframes as
below that are NOT equally-sized data frames (i.e., number of columns are
different in each table):



Table_A:

Email             Name                   Phone

abc at gmail.com   John Chan         0909

bcd at yahoo.com   Tim Ma                    89089

......



Table_B:

Email                     Name                 Sex        Phone

abc at gmail.com    John Chan            M                 0909

khn at hotmail.com           Rosy  Kim               F                   7779

.....



Now, I have used -

merge (Table_A, Table_B, by="Email", all = FALSE))



- to find only the rows that match from these data frames - based on Email
as primary key.



Further, I am also interested (using "Email" as the common key) which rows
from Table_A did not match with Table_B.

I am not sure how to do this here.

Thanks and regards,
Chintanu
On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

            

  
  
#
Hi,

I'll coded your example into R code:

Table_A <- c('abc at gmail.com', 'John Chan', '0909')
Table_A <- rbind(Table_A, c('bcd at yahoo.com', 'Tim Ma', '89089'))
colnames(Table_A) <- c('Email', 'Name', 'Phone')
Table_A

Table_B <- c('abc at gmail.com', 'John Chan', 'M', '0909')
Table_B <- rbind(Table_B, c('khn at hotmail.com', 'Rosy  Kim', 'F', '7779'))
colnames(Table_B) <- c('Email', 'Name', 'Sex', 'Phone')
Table_B

Did you have a look at this one?
Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE)
Table_C[is.na(Table_C$Name.y),]
Table_C[is.na(Table_C$Name.x),]

Table_C contains all data from Table_A and Table_B. The key.x is NA if
the row comes from Table_B and key.y is NA if the row comes from Table_A.

Best, Robin
On 05/02/2018 11:38 AM, Chintanu wrote:

  
    
#
Have a look at anti_join() from the dplyr package. It does exactly
what you want. Here is an example based on the code of Robin

Table_A <- as.data.frame(Table_A, stringsAsFactors = FALSE)That is
Table_B <- as.data.frame(Table_B, stringsAsFactors = FALSE)
library(dplyr)
anti_join(Table_A, Table_B, by = "Email")
anti_join(Table_B, Table_A, by = "Email")

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




2018-05-02 13:23 GMT+02:00 Dr. Robin Haunschild <R.Haunschild at fkf.mpg.de>: