Hello
I am a new user of R and I need help to merge two large datasets about
stocks with different number of rows and columns.
Both have 2 variable(column) that are have same values ("name" and "date1),
but they are not in same order and "data3" contains much more observations.
In the "data" have dividend information about each stock, and in the second
there are information about the stock prices.
I tried to use this function:
data4<-merge(data,data3, by="name","date1")
But it does not work.
"data 3" has more than 800.000 observations of daily stock prices for each
stock, and "data" has only 14.000, so i want to make a function that merge
the two datasets to a new dataset with the same length as "data", and also
include all the variables in "data3" where "name" (example: Statoil) and
"date1"(example:31.12.2000) is the same in both sets.
Can someone please help me?
Regards
--
View this message in context: http://r.789695.n4.nabble.com/merging-two-dataframes-tp3932869p3932869.html
Sent from the R help mailing list archive at Nabble.com.
merging two dataframes
12 messages · dividend, Jeff Newmiller, Timothy Bates +1 more
here is some more info: http://r.789695.n4.nabble.com/file/n3932898/help.jpg this is what`s happend when I run the function in R I appreciate if someone can give me some inputs here Thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/merging-two-dataframes-tp3932869p3932898.html Sent from the R help mailing list archive at Nabble.com.
Hi
Hello
I am a new user of R and I need help to merge two large datasets about
stocks with different number of rows and columns.
Both have 2 variable(column) that are have same values ("name" and
"date1),
but they are not in same order and "data3" contains much more
observations.
In the "data" have dividend information about each stock, and in the
second
there are information about the stock prices. I tried to use this function: data4<-merge(data,data3, by="name","date1")
Does
data4<-merge(data,data3, by=c("name","date1"), all=T)
do what you want?
I think merge help page is quite explanatory and you shall look at it.
Regards
Petr
But it does not work. "data 3" has more than 800.000 observations of daily stock prices for
each
stock, and "data" has only 14.000, so i want to make a function that
merge
the two datasets to a new dataset with the same length as "data", and
also
include all the variables in "data3" where "name" (example: Statoil) and "date1"(example:31.12.2000) is the same in both sets. Can someone please help me? Regards -- View this message in context: http://r.789695.n4.nabble.com/merging-two- dataframes-tp3932869p3932869.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thank you Petr! I have read on the merge help page, but I cant figure out how to write this function. When I use your function it includes all data from "data3", but all columns in "data" has "NA"(without "name" and "date". I hoped to keep these values to. I try to explain it more precise: In "data" with 14000 observations about company name, date, size of dividend etc. In "data3" there are 800000 daily observations of the stockprices for each company listed: name date div statoil 17.05.2000 5 statoil 18.05.2001 6 ........ .............. ... Yara 17.05.2000 10 etc I want to get the stockprice for statoil, yara etc from "data3", and merge it into a new data like this: name date div price statoil 17.05.2000 5 120 statoil 18.05.2001 6 130 ........ .............. ... Yara 17.05.2000 10 200 etc And also keep the rest of the columns from both datasets. name date div price industry secid etc statoil 17.05.2000 5 120 ....... ..... ... statoil 18.05.2001 6 130 ....... ...... .... ........ .............. ... Yara 17.05.2000 10 200 ..... ...... ... etc Using the first function i get this result: name date div price industry secid etc statoil 17.05.2000 NA 120 ....... ..... ... statoil 18.05.2001 NA 130 ....... ...... .... ........ .............. ... Yara 17.05.2000 NA 200 ..... ...... ... Hope you understand what i want to do. Thanks again, I really appreciate it! etc -- View this message in context: http://r.789695.n4.nabble.com/merging-two-dataframes-tp3932869p3933101.html Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111024/4aaefa64/attachment.pl>
Thank you Petr! I have read on the merge help page, but I cant figure out how to write
this
function. When I use your function it includes all data from "data3", but all
columns
in "data" has "NA"(without "name" and "date". I hoped to keep these
values Because name and date common columns and you do not need them twice. Merge shall work as you expected provided input data are OK. The result of command I suggested shall be all values from both data frames aligned by name and date. It is impossible to say what is wrong without reproducible data. Did you check size of your data frames after merge? I suspect that date in data and date in data3 is somewhat different. You shall check it with ?str function. As Posting guide says you shall provide some suitable data so that we can look at it and work with it. Something like dput(data[1:20,]) could be enough. Regards Petr
to. I try to explain it more precise: In "data" with 14000 observations about company name, date, size of
dividend
etc. In "data3" there are 800000 daily observations of the stockprices for
each
company listed: name date div statoil 17.05.2000 5 statoil 18.05.2001 6 ........ .............. ... Yara 17.05.2000 10 etc I want to get the stockprice for statoil, yara etc from "data3", and
merge
it into a new data like this: name date div price statoil 17.05.2000 5 120 statoil 18.05.2001 6 130 ........ .............. ... Yara 17.05.2000 10 200 etc And also keep the rest of the columns from both datasets. name date div price industry secid etc statoil 17.05.2000 5 120 ....... ..... ... statoil 18.05.2001 6 130 ....... ...... .... ........ .............. ... Yara 17.05.2000 10 200 ..... ...... ... etc Using the first function i get this result: name date div price industry secid etc statoil 17.05.2000 NA 120 ....... ..... ... statoil 18.05.2001 NA 130 ....... ...... .... ........ .............. ... Yara 17.05.2000 NA 200 ..... ...... ... Hope you understand what i want to do. Thanks again, I really appreciate it! etc -- View this message in context: http://r.789695.n4.nabble.com/merging-two- dataframes-tp3932869p3933101.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
1 day later
Hello. Now i tried to do what you told me. I used the str(fuction), and data$date1 and data3$date1 where both listed "character". I changed "name" to character but it did not work either. I also changed all variables to character, with no positive result. str(data) 'data.frame': 14446 obs. of 15 variables: $ id : chr "1" "1" "1" "1" ... $ compid : chr "2514" "2514" "2514" "2514" ... $ secid : chr "15856" "15856" "15856" "15856" ... $ name : chr "A-pressen" "A-pressen" "A-pressen" "A-pressen" ... $ period : chr "1" "2" "3" "4" ... $ date : chr "17.05.1980" "17.05.1981" "17.05.1982" "17.05.1983" ... $ enddate: chr "17.05.1981" "17.05.1982" "17.05.1983" "17.05.1984" ... $ div : chr NA NA NA NA ... $ ndivs : chr NA NA NA NA ... $ posdiv : chr NA NA NA NA ... $ ddiv2 : chr NA NA NA NA ... $ ddiv3 : chr NA NA NA NA ... $ ddiv4 : chr NA NA NA NA ... $ ddiv5 : chr NA NA NA NA ... $ ddiv6 : chr NA NA NA NA ... str(data3) 'data.frame': 812354 obs. of 9 variables: $ date : chr "02.01.1996" "03.01.1996" "04.01.1996" "05.01.1996" ... $ Securityid : chr "6001" "6001" "6001" "6001" ... $ Symbol : chr "AAV" "AAV" "AAV" "AAV" ... $ name : chr "Adresseavisen" "Adresseavisen" "Adresseavisen" "Adresseavisen" ... $ Securitytype : chr "Ordinary Shares" "Ordinary Shares" "Ordinary Shares" "Ordinary Shares" ... $ Unadjusted : chr "200" "200" "200" "200" ... $ Event.adjusted : chr "200" "200" "200" "200" ... $ Div.and.Event.adjusted: chr "109,7595375" "109,7595375" "109,7595375" "109,7595375" ... $ Sharesissued : chr "1901646" "1901646" "1901646" "1901646" ... Here is some suitable data for "data"
dput(data[1:20,])
structure(list(id = c("1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
compid = c("2514", "2514", "2514", "2514", "2514", "2514",
"2514", "2514", "2514", "2514", "2514", "2514", "2514", "2514",
"2514", "2514", "2514", "2514", "2514", "2514"), secid = c("15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856"), name = c("A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen"), period = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19", "20"), date = c("17.05.1980",
"17.05.1981", "17.05.1982", "17.05.1983", "17.05.1984", "17.05.1985",
"17.05.1986", "17.05.1987", "17.05.1988", "17.05.1989", "17.05.1990",
"17.05.1991", "17.05.1992", "17.05.1993", "17.05.1994", "17.05.1995",
"17.05.1996", "17.05.1997", "17.05.1998", "17.05.1999"),
enddate = c("17.05.1981", "17.05.1982", "17.05.1983", "17.05.1984",
"17.05.1985", "17.05.1986", "17.05.1987", "17.05.1988", "17.05.1989",
"17.05.1990", "17.05.1991", "17.05.1992", "17.05.1993", "17.05.1994",
"17.05.1995", "17.05.1996", "17.05.1997", "17.05.1998", "17.05.1999",
"17.05.2000"), div = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
"0", "0", "0", "0", "0", "5", "0", "1.1", "1.2", "1", "0"
), ndivs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "0", "0",
"0", "0", "0", "1", "0", "1", "1", "1", "0"), posdiv = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "1",
NA, "1", "1", "1", NA), ddiv2 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, "0", "0", "0", "0", "0", NA, "0", "1", NA,
NA), ddiv3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
"0", "0", "0", "0", "0", "0", "0", "0", "-1"), ddiv4 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "0", "0", "0",
"0", "0", "0", "0", "0"), ddiv5 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, "0", "0", "0", "0", "0", "0",
"0"), ddiv6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, "0", "0", "0", "0", "0", "0")), .Names = c("id",
"compid", "secid", "name", "period", "date", "enddate", "div",
"ndivs", "posdiv", "ddiv2", "ddiv3", "ddiv4", "ddiv5", "ddiv6"
), row.names = c(NA, 20L), class = "data.frame")
Here is some suitable data for "data3":
dput(data3[1:20,])
structure(list(date = c("02.01.1996", "03.01.1996", "04.01.1996",
"05.01.1996", "08.01.1996", "09.01.1996", "10.01.1996", "11.01.1996",
"12.01.1996", "15.01.1996", "16.01.1996", "17.01.1996", "18.01.1996",
"19.01.1996", "22.01.1996", "23.01.1996", "24.01.1996", "25.01.1996",
"26.01.1996", "29.01.1996"), Securityid = c("6001", "6001", "6001",
"6001", "6001", "6001", "6001", "6001", "6001", "6001", "6001",
"6001", "6001", "6001", "6001", "6001", "6001", "6001", "6001",
"6001"), Symbol = c("AAV", "AAV", "AAV", "AAV", "AAV", "AAV",
"AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV",
"AAV", "AAV", "AAV", "AAV", "AAV"), name = c("Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen"), Securitytype =
c("Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares"), Unadjusted =
c("200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200"), Event.adjusted = c("200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200"), Div.and.Event.adjusted =
c("109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375"),
Sharesissued = c("1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646")), .Names = c("date",
"Securityid", "Symbol", "name", "Securitytype", "Unadjusted",
"Event.adjusted", "Div.and.Event.adjusted", "Sharesissued"), row.names =
c(NA,
20L), class = "data.frame")
When I run the function:
data4<-merge(data,data3, by=c("name","date1"), all=T)
dput(data4[1:20,])
structure(list(name = c("A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen"), date = c("01.02.1999", "01.02.2000",
"01.02.2001", "01.02.2002", "01.03.1999", "01.03.2000", "01.03.2001",
"01.03.2002", "01.04.2003", "01.06.1999", "01.06.2001", "01.07.1999",
"01.07.2002", "01.07.2003", "01.08.2000", "01.08.2001", "01.08.2002",
"01.08.2003", "01.09.1999", "01.09.2000"), id = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
compid = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), secid = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), period = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), enddate = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), div = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ndivs = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), posdiv = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv2 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv3 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv4 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv5 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv6 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), Securityid = c("15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856"), Symbol = c("APR",
"APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR",
"APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR",
"APR"), Securitytype = c("Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares"),
Unadjusted = c("120", "140", "160", "105", "110", "190",
"160", "112", "115", "120", "150", "127", "106,5", "154",
"155", "160", "111", "155", "127", "205"), Event.adjusted = c("120",
"140", "160", "105", "110", "190", "160", "112", "115", "120",
"150", "127", "106,5", "154", "155", "160", "111", "155",
"127", "205"), Div.and.Event.adjusted = c("111,4092308",
"129,9774359", "148,545641", "100,1538462", "102,1251282",
"176,3979487", "148,545641", "106,8307692", "109,6923077",
"111,4092308", "143,0769231", "117,9081026", "101,5846154",
"154", "143,9035897", "152,6153846", "105,8769231", "155",
"117,9081026", "190,3241026"), Sharesissued = c("8839643",
"8839643", "8854307", "8866191", "8839643", "8839643", "8854307",
"8867791", "8885537", "8839643", "8866191", "8839643", "8885537",
"8903842", "8854307", "8866191", "8885537", "8903842", "8839643",
"8854307")), .Names = c("name", "date", "id", "compid", "secid",
"period", "enddate", "div", "ndivs", "posdiv", "ddiv2", "ddiv3",
"ddiv4", "ddiv5", "ddiv6", "Securityid", "Symbol", "Securitytype",
"Unadjusted", "Event.adjusted", "Div.and.Event.adjusted", "Sharesissued"
), row.names = c(NA, 20L), class = "data.frame")
If I want to just keep the observations from "data" (14446 observations)
merged with "data3". Is it correct to use:
data4<-merge(data,data3, by=c("name","date1"), all=T)
or should I use:
data4<-merge(data,data3, by=c("name","date1"), all.x=F)?
(Then I get ca 14000 obs, but "NA" in all variables from "data3".
In advance, thank you.
--
View this message in context: http://r.789695.n4.nabble.com/merging-two-dataframes-tp3932869p3940157.html
Sent from the R help mailing list archive at Nabble.com.
I think you want something like this (I like to be explicit about what you are merging) df3 = merge(df1, df2, by = "date", all=T) You can be explicit about what you are merging on in each file: df3 = merge(df1,df2, by.x = "date?, by.y="date", all=T) You were trying to merge on ?date1? but it looks to me like your data frames actually contains columns called ?date? not ?date1" As Petr says, in the vanilla situation where there is no overlap of data and the ID column has the same name in both frames, then merge(frame1, frame2) works by itself. tip: don?t use words like ?data? as variable names, as that is also a function
On 26 Oct 2011, at 11:59 AM, dividend wrote:
Hello. Now i tried to do what you told me. I used the str(fuction), and data$date1 and data3$date1 where both listed "character". I changed "name" to character but it did not work either. I also changed all variables to character, with no positive result. str(data) 'data.frame': 14446 obs. of 15 variables: $ id : chr "1" "1" "1" "1" ... $ compid : chr "2514" "2514" "2514" "2514" ... $ secid : chr "15856" "15856" "15856" "15856" ... $ name : chr "A-pressen" "A-pressen" "A-pressen" "A-pressen" ... $ period : chr "1" "2" "3" "4" ... $ date : chr "17.05.1980" "17.05.1981" "17.05.1982" "17.05.1983" ... $ enddate: chr "17.05.1981" "17.05.1982" "17.05.1983" "17.05.1984" ... $ div : chr NA NA NA NA ... $ ndivs : chr NA NA NA NA ... $ posdiv : chr NA NA NA NA ... $ ddiv2 : chr NA NA NA NA ... $ ddiv3 : chr NA NA NA NA ... $ ddiv4 : chr NA NA NA NA ... $ ddiv5 : chr NA NA NA NA ... $ ddiv6 : chr NA NA NA NA ... str(data3) 'data.frame': 812354 obs. of 9 variables: $ date : chr "02.01.1996" "03.01.1996" "04.01.1996" "05.01.1996" ... $ Securityid : chr "6001" "6001" "6001" "6001" ... $ Symbol : chr "AAV" "AAV" "AAV" "AAV" ... $ name : chr "Adresseavisen" "Adresseavisen" "Adresseavisen" "Adresseavisen" ... $ Securitytype : chr "Ordinary Shares" "Ordinary Shares" "Ordinary Shares" "Ordinary Shares" ... $ Unadjusted : chr "200" "200" "200" "200" ... $ Event.adjusted : chr "200" "200" "200" "200" ... $ Div.and.Event.adjusted: chr "109,7595375" "109,7595375" "109,7595375" "109,7595375" ... $ Sharesissued : chr "1901646" "1901646" "1901646" "1901646" ... Here is some suitable data for "data"
dput(data[1:20,])
structure(list(id = c("1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
compid = c("2514", "2514", "2514", "2514", "2514", "2514",
"2514", "2514", "2514", "2514", "2514", "2514", "2514", "2514",
"2514", "2514", "2514", "2514", "2514", "2514"), secid = c("15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856"), name = c("A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen"), period = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19", "20"), date = c("17.05.1980",
"17.05.1981", "17.05.1982", "17.05.1983", "17.05.1984", "17.05.1985",
"17.05.1986", "17.05.1987", "17.05.1988", "17.05.1989", "17.05.1990",
"17.05.1991", "17.05.1992", "17.05.1993", "17.05.1994", "17.05.1995",
"17.05.1996", "17.05.1997", "17.05.1998", "17.05.1999"),
enddate = c("17.05.1981", "17.05.1982", "17.05.1983", "17.05.1984",
"17.05.1985", "17.05.1986", "17.05.1987", "17.05.1988", "17.05.1989",
"17.05.1990", "17.05.1991", "17.05.1992", "17.05.1993", "17.05.1994",
"17.05.1995", "17.05.1996", "17.05.1997", "17.05.1998", "17.05.1999",
"17.05.2000"), div = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
"0", "0", "0", "0", "0", "5", "0", "1.1", "1.2", "1", "0"
), ndivs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "0", "0",
"0", "0", "0", "1", "0", "1", "1", "1", "0"), posdiv = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "1",
NA, "1", "1", "1", NA), ddiv2 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, "0", "0", "0", "0", "0", NA, "0", "1", NA,
NA), ddiv3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
"0", "0", "0", "0", "0", "0", "0", "0", "-1"), ddiv4 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "0", "0", "0",
"0", "0", "0", "0", "0"), ddiv5 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, "0", "0", "0", "0", "0", "0",
"0"), ddiv6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, "0", "0", "0", "0", "0", "0")), .Names = c("id",
"compid", "secid", "name", "period", "date", "enddate", "div",
"ndivs", "posdiv", "ddiv2", "ddiv3", "ddiv4", "ddiv5", "ddiv6"
), row.names = c(NA, 20L), class = "data.frame")
Here is some suitable data for "data3":
dput(data3[1:20,])
structure(list(date = c("02.01.1996", "03.01.1996", "04.01.1996",
"05.01.1996", "08.01.1996", "09.01.1996", "10.01.1996", "11.01.1996",
"12.01.1996", "15.01.1996", "16.01.1996", "17.01.1996", "18.01.1996",
"19.01.1996", "22.01.1996", "23.01.1996", "24.01.1996", "25.01.1996",
"26.01.1996", "29.01.1996"), Securityid = c("6001", "6001", "6001",
"6001", "6001", "6001", "6001", "6001", "6001", "6001", "6001",
"6001", "6001", "6001", "6001", "6001", "6001", "6001", "6001",
"6001"), Symbol = c("AAV", "AAV", "AAV", "AAV", "AAV", "AAV",
"AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV",
"AAV", "AAV", "AAV", "AAV", "AAV"), name = c("Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen"), Securitytype =
c("Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares"), Unadjusted =
c("200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200"), Event.adjusted = c("200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200"), Div.and.Event.adjusted =
c("109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375"),
Sharesissued = c("1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646")), .Names = c("date",
"Securityid", "Symbol", "name", "Securitytype", "Unadjusted",
"Event.adjusted", "Div.and.Event.adjusted", "Sharesissued"), row.names =
c(NA,
20L), class = "data.frame")
When I run the function:
data4<-merge(data,data3, by=c("name","date1"), all=T)
dput(data4[1:20,])
structure(list(name = c("A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen"), date = c("01.02.1999", "01.02.2000",
"01.02.2001", "01.02.2002", "01.03.1999", "01.03.2000", "01.03.2001",
"01.03.2002", "01.04.2003", "01.06.1999", "01.06.2001", "01.07.1999",
"01.07.2002", "01.07.2003", "01.08.2000", "01.08.2001", "01.08.2002",
"01.08.2003", "01.09.1999", "01.09.2000"), id = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
compid = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), secid = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), period = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), enddate = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), div = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ndivs = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), posdiv = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv2 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv3 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv4 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv5 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv6 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), Securityid = c("15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856"), Symbol = c("APR",
"APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR",
"APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR",
"APR"), Securitytype = c("Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares"),
Unadjusted = c("120", "140", "160", "105", "110", "190",
"160", "112", "115", "120", "150", "127", "106,5", "154",
"155", "160", "111", "155", "127", "205"), Event.adjusted = c("120",
"140", "160", "105", "110", "190", "160", "112", "115", "120",
"150", "127", "106,5", "154", "155", "160", "111", "155",
"127", "205"), Div.and.Event.adjusted = c("111,4092308",
"129,9774359", "148,545641", "100,1538462", "102,1251282",
"176,3979487", "148,545641", "106,8307692", "109,6923077",
"111,4092308", "143,0769231", "117,9081026", "101,5846154",
"154", "143,9035897", "152,6153846", "105,8769231", "155",
"117,9081026", "190,3241026"), Sharesissued = c("8839643",
"8839643", "8854307", "8866191", "8839643", "8839643", "8854307",
"8867791", "8885537", "8839643", "8866191", "8839643", "8885537",
"8903842", "8854307", "8866191", "8885537", "8903842", "8839643",
"8854307")), .Names = c("name", "date", "id", "compid", "secid",
"period", "enddate", "div", "ndivs", "posdiv", "ddiv2", "ddiv3",
"ddiv4", "ddiv5", "ddiv6", "Securityid", "Symbol", "Securitytype",
"Unadjusted", "Event.adjusted", "Div.and.Event.adjusted", "Sharesissued"
), row.names = c(NA, 20L), class = "data.frame")
If I want to just keep the observations from "data" (14446 observations)
merged with "data3". Is it correct to use:
data4<-merge(data,data3, by=c("name","date1"), all=T)
or should I use:
data4<-merge(data,data3, by=c("name","date1"), all.x=F)?
(Then I get ca 14000 obs, but "NA" in all variables from "data3".
In advance, thank you.
--
View this message in context: http://r.789695.n4.nabble.com/merging-two-dataframes-tp3932869p3940157.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I pasted wrong function, I have changed from "date1" to "date" (ignore that). I think it have to be something wrong with my data format. I can`t understand why it don't work. I know I can use by.x=.... and by.y=...., but since both datasets have the same variable name it should be unnecessary to do that. -- View this message in context: http://r.789695.n4.nabble.com/merging-two-dataframes-tp3932869p3940396.html Sent from the R help mailing list archive at Nabble.com.
Hello. Now i tried to do what you told me. I used the str(fuction), and data$date1 and data3$date1 where both
listed
You have no date1 only date. Therefore
result<- merge(data, data3, by=c("date", "name"), all=T)
takes all values from both data frames
dim(data)
[1] 20 15
dim(data3)
[1] 20 9 alltogether 24 columns from which 4 are date and name columns therefore 20 columns contain data.
dim(result)
[1] 40 22 So the result has all 20 columns from both data frames plus one name and one date column and all rows from both data frames = 40. Those two sets are disjoint. If you had some common date and name in both data frames these rows would be merged on the same row in result. Let us try this.
data3$name[1:5] <- data$name[1:5] data3$date[3:5] <- data$date[3:5]
result<- merge(data, data3, by=c("date", "name"), all=T)
dim(result)
[1] 37 22
Regards
Petr
"character". I changed "name" to character but it did not work either. I also changed all variables to character, with no positive result. str(data) 'data.frame': 14446 obs. of 15 variables: $ id : chr "1" "1" "1" "1" ... $ compid : chr "2514" "2514" "2514" "2514" ... $ secid : chr "15856" "15856" "15856" "15856" ... $ name : chr "A-pressen" "A-pressen" "A-pressen" "A-pressen" ... $ period : chr "1" "2" "3" "4" ... $ date : chr "17.05.1980" "17.05.1981" "17.05.1982" "17.05.1983" ... $ enddate: chr "17.05.1981" "17.05.1982" "17.05.1983" "17.05.1984" ... $ div : chr NA NA NA NA ... $ ndivs : chr NA NA NA NA ... $ posdiv : chr NA NA NA NA ... $ ddiv2 : chr NA NA NA NA ... $ ddiv3 : chr NA NA NA NA ... $ ddiv4 : chr NA NA NA NA ... $ ddiv5 : chr NA NA NA NA ... $ ddiv6 : chr NA NA NA NA ... str(data3) 'data.frame': 812354 obs. of 9 variables: $ date : chr "02.01.1996" "03.01.1996" "04.01.1996" "05.01.1996" ... $ Securityid : chr "6001" "6001" "6001" "6001" ... $ Symbol : chr "AAV" "AAV" "AAV" "AAV" ... $ name : chr "Adresseavisen" "Adresseavisen" "Adresseavisen" "Adresseavisen" ... $ Securitytype : chr "Ordinary Shares" "Ordinary Shares" "Ordinary Shares" "Ordinary Shares" ... $ Unadjusted : chr "200" "200" "200" "200" ... $ Event.adjusted : chr "200" "200" "200" "200" ... $ Div.and.Event.adjusted: chr "109,7595375" "109,7595375"
"109,7595375"
"109,7595375" ... $ Sharesissued : chr "1901646" "1901646" "1901646" "1901646"
...
Here is some suitable data for "data"
dput(data[1:20,])
structure(list(id = c("1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
compid = c("2514", "2514", "2514", "2514", "2514", "2514",
"2514", "2514", "2514", "2514", "2514", "2514", "2514", "2514",
"2514", "2514", "2514", "2514", "2514", "2514"), secid = c("15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856"), name = c("A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen"), period = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19", "20"), date =
c("17.05.1980",
"17.05.1981", "17.05.1982", "17.05.1983", "17.05.1984",
"17.05.1985",
"17.05.1986", "17.05.1987", "17.05.1988", "17.05.1989",
"17.05.1990",
"17.05.1991", "17.05.1992", "17.05.1993", "17.05.1994",
"17.05.1995",
"17.05.1996", "17.05.1997", "17.05.1998", "17.05.1999"),
enddate = c("17.05.1981", "17.05.1982", "17.05.1983", "17.05.1984",
"17.05.1985", "17.05.1986", "17.05.1987", "17.05.1988",
"17.05.1989",
"17.05.1990", "17.05.1991", "17.05.1992", "17.05.1993",
"17.05.1994",
"17.05.1995", "17.05.1996", "17.05.1997", "17.05.1998",
"17.05.1999",
"17.05.2000"), div = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
"0", "0", "0", "0", "0", "5", "0", "1.1", "1.2", "1", "0"
), ndivs = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "0", "0",
"0", "0", "0", "1", "0", "1", "1", "1", "0"), posdiv = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "1",
NA, "1", "1", "1", NA), ddiv2 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, "0", "0", "0", "0", "0", NA, "0", "1", NA,
NA), ddiv3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
"0", "0", "0", "0", "0", "0", "0", "0", "-1"), ddiv4 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "0", "0", "0",
"0", "0", "0", "0", "0"), ddiv5 = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, "0", "0", "0", "0", "0", "0",
"0"), ddiv6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, "0", "0", "0", "0", "0", "0")), .Names = c("id",
"compid", "secid", "name", "period", "date", "enddate", "div",
"ndivs", "posdiv", "ddiv2", "ddiv3", "ddiv4", "ddiv5", "ddiv6"
), row.names = c(NA, 20L), class = "data.frame")
Here is some suitable data for "data3":
dput(data3[1:20,])
structure(list(date = c("02.01.1996", "03.01.1996", "04.01.1996",
"05.01.1996", "08.01.1996", "09.01.1996", "10.01.1996", "11.01.1996",
"12.01.1996", "15.01.1996", "16.01.1996", "17.01.1996", "18.01.1996",
"19.01.1996", "22.01.1996", "23.01.1996", "24.01.1996", "25.01.1996",
"26.01.1996", "29.01.1996"), Securityid = c("6001", "6001", "6001",
"6001", "6001", "6001", "6001", "6001", "6001", "6001", "6001",
"6001", "6001", "6001", "6001", "6001", "6001", "6001", "6001",
"6001"), Symbol = c("AAV", "AAV", "AAV", "AAV", "AAV", "AAV",
"AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV", "AAV",
"AAV", "AAV", "AAV", "AAV", "AAV"), name = c("Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen", "Adresseavisen",
"Adresseavisen", "Adresseavisen", "Adresseavisen"), Securitytype =
c("Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary
Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary
Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary
Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares", "Ordinary
Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares"), Unadjusted =
c("200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200"), Event.adjusted = c("200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200", "200", "200", "200",
"200", "200", "200", "200", "200", "200"), Div.and.Event.adjusted =
c("109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375",
"109,7595375",
"109,7595375", "109,7595375", "109,7595375", "109,7595375"),
Sharesissued = c("1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646", "1901646", "1901646",
"1901646", "1901646", "1901646", "1901646")), .Names = c("date",
"Securityid", "Symbol", "name", "Securitytype", "Unadjusted",
"Event.adjusted", "Div.and.Event.adjusted", "Sharesissued"), row.names =
c(NA,
20L), class = "data.frame")
When I run the function:
data4<-merge(data,data3, by=c("name","date1"), all=T)
dput(data4[1:20,])
structure(list(name = c("A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen", "A-pressen", "A-pressen", "A-pressen",
"A-pressen", "A-pressen"), date = c("01.02.1999", "01.02.2000",
"01.02.2001", "01.02.2002", "01.03.1999", "01.03.2000", "01.03.2001",
"01.03.2002", "01.04.2003", "01.06.1999", "01.06.2001", "01.07.1999",
"01.07.2002", "01.07.2003", "01.08.2000", "01.08.2001", "01.08.2002",
"01.08.2003", "01.09.1999", "01.09.2000"), id = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
compid = c(NA_character_, NA_character_, NA_character_,
NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), secid = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), period = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), enddate = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), div = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ndivs =
c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), posdiv =
c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv2 =
c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv3 =
c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv4 =
c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv5 =
c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), ddiv6 =
c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), Securityid =
c("15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856", "15856", "15856",
"15856", "15856", "15856", "15856", "15856"), Symbol = c("APR",
"APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR",
"APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR", "APR",
"APR"), Securitytype = c("Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares",
"Ordinary Shares", "Ordinary Shares", "Ordinary Shares"),
Unadjusted = c("120", "140", "160", "105", "110", "190",
"160", "112", "115", "120", "150", "127", "106,5", "154",
"155", "160", "111", "155", "127", "205"), Event.adjusted = c("120",
"140", "160", "105", "110", "190", "160", "112", "115", "120",
"150", "127", "106,5", "154", "155", "160", "111", "155",
"127", "205"), Div.and.Event.adjusted = c("111,4092308",
"129,9774359", "148,545641", "100,1538462", "102,1251282",
"176,3979487", "148,545641", "106,8307692", "109,6923077",
"111,4092308", "143,0769231", "117,9081026", "101,5846154",
"154", "143,9035897", "152,6153846", "105,8769231", "155",
"117,9081026", "190,3241026"), Sharesissued = c("8839643",
"8839643", "8854307", "8866191", "8839643", "8839643", "8854307",
"8867791", "8885537", "8839643", "8866191", "8839643", "8885537",
"8903842", "8854307", "8866191", "8885537", "8903842", "8839643",
"8854307")), .Names = c("name", "date", "id", "compid", "secid",
"period", "enddate", "div", "ndivs", "posdiv", "ddiv2", "ddiv3",
"ddiv4", "ddiv5", "ddiv6", "Securityid", "Symbol", "Securitytype",
"Unadjusted", "Event.adjusted", "Div.and.Event.adjusted", "Sharesissued"
), row.names = c(NA, 20L), class = "data.frame")
If I want to just keep the observations from "data" (14446 observations)
merged with "data3". Is it correct to use:
data4<-merge(data,data3, by=c("name","date1"), all=T)
or should I use:
data4<-merge(data,data3, by=c("name","date1"), all.x=F)?
(Then I get ca 14000 obs, but "NA" in all variables from "data3".
In advance, thank you.
--
View this message in context: http://r.789695.n4.nabble.com/merging-two-
dataframes-tp3932869p3940157.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging two dataframes I pasted wrong function, I have changed from "date1" to "date" (ignore
that).
I think it have to be something wrong with my data format. I can`t understand why it don't work. I know I can use by.x=.... and by.y=....,
but
since both datasets have the same variable name it should be unnecessary
to
do that.
Again check dimensions of your the three data frames. Number of rows in final data frame shall be at least same as the number of rows in bigger data frame and lower than sum of rows of both merged data frames. Regards Petr
-- View this message in context: http://r.789695.n4.nabble.com/merging-two- dataframes-tp3932869p3940396.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
So when I do the merge on your example frames, I get the expected result. But the example component dataframes you sent are already full of NAs, and there are no rows which are present in both data sets. So I think perhaps, that merge is just highlighting a problem that has its roots in your component data. t
On 26 Oct 2011, at 1:16 PM, dividend wrote:
I pasted wrong function, I have changed from "date1" to "date" (ignore that). I think it have to be something wrong with my data format. I can`t understand why it don't work. I know I can use by.x=.... and by.y=...., but since both datasets have the same variable name it should be unnecessary to do that.