I have a question about stacking datasets.
I have 40 stata datasets that have exactly the same number of variables,
with the same names (~420k rows, 8 columns).
The datasets are relatively large ~ 15 megs.
If they were text files a linux "cat file1 file2 >> combo" sort of
strategy would work.
I've considered using a merge command, but I don't want any records
merged, only appended. Also, I don't want any variables to be renamed.
Given there unique nature, using a simple
merge(read.dta('file1'),read.dta('file2')) would give me (I think) what
I am looking for but seems incredibly inefficient.
There are a number of ways I could approach this all of which involve a
non-R solution.
Does somebody have an R solution in mind?
Thanks in advance,
Debra Taylor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://stat.ethz.ch/pipermail/r-help/attachments/20010719/621a5a9d/attachment.html
Append/merge
3 messages · Debra Taylor, Peter Dalgaard, Mark Myatt
"Debra Taylor" <debrat at bestweb.net> writes:
I have a question about stacking datasets.
I have 40 stata datasets that have exactly the same number of variables,
with the same names (~420k rows, 8 columns).
The datasets are relatively large ~ 15 megs.
If they were text files a linux "cat file1 file2 >> combo" sort of
strategy would work.
I've considered using a merge command, but I don't want any records
merged, only appended. Also, I don't want any variables to be renamed.
Given there unique nature, using a simple
merge(read.dta('file1'),read.dta('file2')) would give me (I think) what
I am looking for but seems incredibly inefficient.
There are a number of ways I could approach this all of which involve a
non-R solution.
Does somebody have an R solution in mind?
rbind?
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Debra Taylor <debrat at bestweb.net> writes:
I have a question about stacking datasets. I have 40 stata datasets that have exactly the same number of variables, with the same names (~420k rows, 8 columns). The datasets are relatively large ~ 15 megs. If they were text files a linux "cat file1 file2 >> combo" sort of strategy would work.
[Snip]
Does somebody have an R solution in mind?
Read each data file in as a data frame then use the rbind() function.
Something like:
a <- read.table("file1", header = T)
b <- read.table("file2", header = T)
my.data <- rbind(a, b)
#
# remove the excess 15MB ...
#
rm(b)
a <- read.table("file3", header = T)
my.data <- rbind(ab, a)
#
# and so on ... finally removing the excess 15MB ...
#
rm(a)
Mark
--
Mark Myatt
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._