Trying to merge new data set to bottom of old data set. Both are zoo objects. - R-help

knavero

Tue, Apr 3, 2012 8:57 PM #

Here is the data I'm working with:

http://r.789695.n4.nabble.com/file/n4530888/new.txt new.txt 

http://r.789695.n4.nabble.com/file/n4530888/old.txt old.txt 

My code is here:

http://pastebin.com/9jjs6Ahr

I'm looking for away to simply attach the new.txt to the bottom of old.txt
through R, else I'll just throw it in Excel to do some preprocessing. I've
looked into using merge, cbind, concatenate, and rbind. However, I'm running
into problems where the 2012 data keeps ending up on top before the 2010 and
2011 data or the function just adds more extra columns to the right side. Is
there a simple method of doing this? Thanks.

--
View this message in context: http://r.789695.n4.nabble.com/Trying-to-merge-new-data-set-to-bottom-of-old-data-set-Both-are-zoo-objects-tp4530888p4530888.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Tue, Apr 3, 2012 10:47 PM #

Here's a case where it doesn't work. Again, the problem is that when I use
the rbind or concatenate functions, the 2012 data set seems to go ahead of
the 2010 and 2011 portions of the data set. The problem seems dependent on
the text files I read in:

http://r.789695.n4.nabble.com/file/n4531011/old.txt old.txt 

http://r.789695.n4.nabble.com/file/n4531011/new.txt new.txt 

using this code:

http://pastebin.com/8W6KaaPQ

In a case where it works, and the data seemed to be in the right order, I
read in a different old.txt named old1.txt and somehow it seemed to work.
The contents and format were similar to that of new.txt where there was 18
columns with the same headers. Here are the files to use:

http://r.789695.n4.nabble.com/file/n4531011/old1.txt old1.txt 

http://r.789695.n4.nabble.com/file/n4531011/new.txt new.txt 

using this code:

http://pastebin.com/6iNF5bPd

That should clarify the issue I'm having. Let me know if a dput is necessary
here. However all the vectors and vector modes seem to check out okay.  



--
View this message in context: http://r.789695.n4.nabble.com/Trying-to-merge-new-data-set-to-bottom-of-old-data-set-Both-are-zoo-objects-tp4530888p4531011.html
Sent from the R help mailing list archive at Nabble.com.

Ashish Agarwal

Wed, Apr 4, 2012 2:04 AM #

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120404/f02e41b8/attachment.pl>

Gabor Grothendieck

Wed, Apr 4, 2012 6:05 AM #

On Wed, Apr 4, 2012 at 1:47 AM, knavero <knavero at gmail.com> wrote:

The problem is that the dates in the new file are of the form 2/23/12
but they are being read in using "%m/%d/%Y %H:%M" .  The %Y should be
%y.  For the old file the format is correct.

A few other points:

- it would be better to use library() than require() here.  If there
is some problem and it can't load the package then library will fail
with an error right at that point -- this is what we want in order to
best reveal where the problem is but with require() it will simply
return FALSE and keep processing and then the error will be later in
the code which is not as convenient for figuring out what went wrong.
Alternately you can use stopifnot(require(...whatever...)).

- please try to cut your data down as far as feasible.  If each file
had 3 lines, say, the same error would have been revealed and it would
have been easier to manage.  Also it would have been possible to
remove all the columns not used and still illustrate this error.  The
very process of reducing it to the smallest dataset you can often
reveal the error.

- if you must post in this fashion then note that read.zoo uses
read.table which can read directly off the net:

new.txt <- "http://r.789695.n4.nabble.com/file/n4531011/new.txt"
new <- read.zoo(new.txt, ...whatever...)

- its better to write out TRUE and FALSE since F and T can be ordinary
variables that a program can create but TRUE and FALSE are keywords so
they can't be overwritten.

- you may or may not prefer this style but it would be possible to replace this:

cls <- c("NULL", NA, "numeric",
      "NULL", "NULL",
      "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL",
      "NULL", "NULL", "NULL", "NULL", "NULL", "NULL")

with this:

cls <- rep(c("NULL", NA, "numeric", "NULL"), c(1, 1, 1, 15))

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

knavero

Wed, Apr 4, 2012 9:38 AM #

Okay, will do. Thanks for all the handy advice Gabor. Ugh, it's such a stupid
bug once I actually know what is going on. I need to go over my Unix
date/time format specifiers, and I'll probably use the rep function to
simplify and reducing the amount of code. A lot of that is definitely new to
me. As for shortening the read in data, I do it find it tricky sometimes
since you have to incrementally test it in the sense that you want to
shorten it to the point that it still reproduces the problem. Anyway, I'll
try to make the data significantly shorter in my next post if possible.
Thanks again.

--
View this message in context: http://r.789695.n4.nabble.com/Trying-to-merge-new-data-set-to-bottom-of-old-data-set-Both-are-zoo-objects-tp4530888p4532484.html
Sent from the R help mailing list archive at Nabble.com.