uneven vector length issue with read.zoo? - R-help

knavero

Wed, May 2, 2012 12:55 PM #

I truncated and simplified my code and the read in data that I'm working with
to isolate the issue. Here is the read in data and R script respectively:

http://r.789695.n4.nabble.com/file/n4604287/test.csv test.csv 

http://pastebin.com/rCdaDqPm

Here is the terminal/R shell output that I hope the above replicates on your
screen:

Error in read.zoo("http://dl.dropbox.com/u/41922443/test.csv", skip = 1,  : 
  index has bad entries at data rows: 14 15 16 17 18 19 20 21 22 23 24 25 26
27 28

I was hoping that the "NULL" in colClasses() would've taken care of this
uneven vector length issue, however, that was not the case. Any ideas?
Thanks in advance. Sorry if my post didn't follow the forum rules exactly. I
tried to make small scale reproducible code and what not. I'm still a bit of
a noob here and there.



--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Wed, May 2, 2012 1:09 PM #

So far I see two options: (1) nrows argument to specify max number of rows to
read in or (2) go into excel, and put a bunch of NA's . Both which are
inefficient in that they're not so "automated".  For case (1), I have to
wait till an error pops up each time and deal with each one individually
taking into account the skip and header args, and for case (2), now I'm just
not even using R to do the dirty work...anyway, I'm going to continue to go
through this R documentation to see if I find anything else for ?read.table
and ?read.zoo.  

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604323.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Wed, May 2, 2012 1:11 PM #

Make that 3 options actually. In case (3) I would have to take each category
on the spreadsheet and isolate each to its own csv file using excel. Fun
stuff...

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604329.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Wed, May 2, 2012 1:13 PM #

Case (4) - use the fill argument in ?read.table....this looks useful...guess
I answered my own question...going to delete this thread now...

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604332.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Wed, May 2, 2012 1:18 PM #

Actually case (4) didn't work. The issue is also with the index.."fill" only
seems to work with the dimensions/columns that contain the data associated
to the index. Dang.....yeah, I need help here.

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604349.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Wed, May 2, 2012 1:22 PM #

blank.lines.skip is not working either...

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604360.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Wed, May 2, 2012 2:54 PM #

case (6) - regress back to read.table apparently....

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604537.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

Wed, May 2, 2012 4:19 PM #

Hello,


knavero wrote

Or to readLines.


tmp <- readLines("http://dl.dropbox.com/u/41922443/test.csv")
# Why doesn't it work?
sapply(strsplit(tmp, ","), length)
# Don't argue with computers, they don't listen.
tmp <- tmp[-1]
tmp <- strsplit(tmp, ",")
tmp <- do.call(rbind, tmp)
nms <- tmp[1, ]
tmp <- tmp[-1, ]
tmp <- data.frame(tmp, stringsAsFactors=FALSE)
colnames(tmp) <- nms
# Now see what we've got
str(tmp) # Messy: one col without a name, dates and nums are chars, etc.


Hope this helps,

Rui Barradas


--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604720.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Wed, May 2, 2012 5:48 PM #

So with case (6) here's the general structure of what I have:

chw = read.table("crac.csv", skip = 1, header = TRUE,
   colClasses = rep(c("NULL", NA, "numeric", "NULL"),
      c(3, 1, 1, 24)),
   sep = ",")
chw$Time.1 = as.POSIXct(chw$Time.1, format = fmt, tz = TZ)
chw = na.omit(chw)
chw = read.zoo(chw, header = TRUE,
   colClasses = rep(c(NA, "numeric"), c(1, 1)),
   FUN = chr, aggregate = tail1)

You don't have to try this, but the main point is that 

read.table -> POSIXct -> na.omit -> read.zoo and chron

I guess this alternative solution is adequate along with using readLines.
Initially I was hoping just a simple read.zoo would do the trick. The catch
is that I need the index/timestamp column to be in chron format for an easy
na.approx function to deal with things. Thank you for the readLines
suggestion Rui. Much appreciated.

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4604841.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

Thu, May 3, 2012 3:36 AM #

On Wed, May 2, 2012 at 3:55 PM, knavero <knavero at gmail.com> wrote:

Try this using the same library statements, fmt and chr from ijn post:

URL <- ""http://dl.dropbox.com/u/41922443/test.csv"
DF1 <- read.table(URL, skip = 1, header = TRUE, sep = ",", fill = TRUE,
  as.is = TRUE)
DF2 <- na.omit(DF1[1:2])
z <- read.zoo(DF2, FUN = chr)

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

knavero

Fri, May 4, 2012 11:51 AM #

Hey Gabor, just trying to understand this here..sorry for the noob question:

DF1 <- read.table(URL, skip = 1, header = TRUE, sep = ",", fill = TRUE,
  as.is = TRUE) 

I'm not to familiar with as.is, however I quickly read the R documentation
on that. From my understanding it converts character to factor in terms of
atomic vector class/mode...sorta like what colClasses would do. Why is it
needed here for this specific case?

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4609382.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Fri, May 4, 2012 12:24 PM #

Thank you for the suggestion Gabor. It's definitely more elegant than what I
had above. Instead of going from character representation to POSIXct to
chron, it looks at the character representation and goes straight to chron.
It's good. However, I do wonder why it still complains of the vector length
even though I nulled out the other columns. It's an interesting error to run
into. Probably looks at FUN before nulling out the other columns was my
theory.

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4609468.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Fri, May 4, 2012 12:37 PM #

"However, I do wonder why it still complains of the vector length even though
I nulled out the other columns. It's an interesting error to run into.
Probably looks at FUN before nulling out the other columns was my theory. "

Referring to just a straight up read.zoo in this case ^

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4609509.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

Fri, May 4, 2012 5:29 PM #

On Fri, May 4, 2012 at 3:37 PM, knavero <knavero at gmail.com> wrote:

If you are referring to your pastebin code then the actual error that
code similar to it gives is:

+    colClasses = rep(c(NA, "numeric", "NULL"), c(1, 1, 3)),
+    FUN = chr, sep = ",")
Error in read.zoo(URL, skip = 1, header = TRUE, colClasses = rep(c(NA,  :
  index has bad entries at data rows: 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28

and that is because there are empty values in the index field.

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

knavero

Sun, May 6, 2012 12:42 PM #

Right, but it seems to me that the error being the NA's in the index field
are caused by the longer vector lengths of columns 4 and 5. I would think
that the EOF in the scanf() (assuming C is used for the source code) would
be called where the NA's begin in columns 1 and 2 since columns 3:5 are
nulled out. Does this sound like a possible case?

So, if the read in data only contained columns 1 and 2, it wouldn't even
look at columns 3:5 and thus, rows 14 and so on wouldn't even be looked at
and that would be EOF already - resulting in no error. 

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4613384.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

Sun, May 6, 2012 2:44 PM #

On Sun, May 6, 2012 at 3:42 PM, knavero <knavero at gmail.com> wrote:

Don't know what "longer vector lengths" refers to but every line in
your pastebin data has 5 fields -- they don't vary.

[1] 5 5

Furthermore, the error message seems pretty clear.  Its saying that
the index has a bad entry and is even telling which row or rows it
occurrs at.

Here is another smaller example where the missing entry in row 3
triggers the same sort of message:

Error in read.zoo(text = "1,2\n2,3\n,4\n6,7", sep = ",") :
  index has bad entry at data row 3

Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

knavero

Sun, May 6, 2012 8:29 PM #

Yeah, I was unclear about what I mean by "uneven vector lengths". I should
say "uneven valid vectors" instead where "valid" refers to (1) a field
containing a value that is not NA, for this specific case, and (2) a value
that is compatible with the vector class assigned through colClasses etc.,
and therefore avoids the read.zoo error. I understand and agree that the
error is clear. I have no issue with that. My issue is with the need to use
read.table and then read.zoo shortly after (this seems inefficient).

I was simply pushing toward the idea of where this type of situation could
be avoided for future users in where if there are uneven valid vectors that
there would be a logical argument saying that it's okay to truncate to the
shortest valid vector (in this case columns 1 and 2). My raw data consisted
of a lot of uneven valid vectors. My expected thought of nulling out columns
3:5 would be that there would have no need for read.zoo to try to read in
the bad data entry rows in columns 1:2 containing NA's that's already
outside of the valid vector length.

Anyway, this is probably trivial now considering that this problem is
already solved haha, and also I don't mean to offend and criticize. I simply
see an efficiency opportunity and an opportunity to create more robust
source code. Why use read.table with read.zoo if you can just do it all with
read.zoo? Do you not agree?

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4613975.html
Sent from the R help mailing list archive at Nabble.com.

knavero

Sun, May 6, 2012 8:34 PM #

For simplicity sake though, yes I understand the issue and solution, and the
solution using read.table, na.omit, and read.zoo is sound. Thanks Gabor! :)

--
View this message in context: http://r.789695.n4.nabble.com/uneven-vector-length-issue-with-read-zoo-tp4604287p4613983.html
Sent from the R help mailing list archive at Nabble.com.