[External] challenging data merging/joining problem
On 2020-07-06 12:03 +0300, Eric Berger wrote:
On Mon, Jul 6, 2020 at 2:07 AM Richard M. Heiberger <rmh at temple.edu> wrote:
On Sun, Jul 5, 2020 at 2:51 PM Christopher W. Ryan <cryan at binghamton.edu> wrote:
I've been conducting relatively simple COVID-19 surveillance for our jurisdiction.
Have you talked directly to the designers of the new database?
Hi Christopher, This seems pretty standard and straightforward, unless I am missing something. You can do the "full join" without changing variable names. Here's a small code example with two tibbles, a and b, where the column 'x' in a corresponds to the column 'u' in b. a <- tibble(x=1:15,y=21:35) b <- tibble(u=c(1:10,51:55),z=31:45) foo <- merge(a,b,by.x="x",by.y="u",all.x=TRUE,all.y=TRUE)
Perhaps something like
new_names <-
c("dob"="birthdate",
"lastName"="last_name",
"firstName"="first_name")
idx <- match(x=names(new_names),
table=colnames(dataSystemA))
colnames(dataSystemA)[idx] <- new_names
merge(
x=dataSystemA,
y=dataSystemB,
by=new_names,
all=TRUE)
which yields
birthdate last_name first_name onsetDate
1 2010-10-11 LOVEGOOD luna <NA>
2 2010-12-06 GRAINGER hermione 2020-07-09
3 2011-01-25 LONGBOTTOM neville 2020-07-10
4 2011-07-03 MALFOY draco <NA>
5 2011-07-14 WEASLEY ron 2020-07-08
6 2011-10-04 POTTER harry 2020-07-07
7 2012-02-13 DIGGORY cedric <NA>
symptomatic date_of_onset symptoms_present
1 NA 2020-07-12 FALSE
2 NA 2020-07-09 TRUE
3 NA 2020-07-10 TRUE
4 NA 2020-07-11 FALSE
5 FALSE <NA> NA
6 TRUE <NA> NA
7 NA 2020-07-13 TRUE
?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200706/b6602a2b/attachment.sig>