Reading word by word in a dataset

Thu, Nov 4, 2004 5:49 AM

On Thu, 4 Nov 2004, John wrote:

^^^^^

That is your problem. It works in the current version of R, 2.0.0. Using
colClasses=NULL was not documented in 1.9.0, and was not intended to work.

What does the posting guide say about this?

system("more mtx.ex.1")

i1-apple 10$ New_York
i2-banana 5$ London
i3-strawberry 7$ Japan

read.table("mtx.ex.1",

colClasses=c("character","NULL","NULL"), fill=T)
Error in data[[i]] : subscript out of bounds

read.table("mtx.ex.1", colClasses=c("character",

NULL, NULL), fill=T)
             V1  V2       V3
1      i1-apple 10$ New_York
2     i2-banana  5$   London
3 i3-strawberry  7$    Japan

read.table("mtx.ex.1", colClasses=c("character",

NULL, NULL), fill=T)[,1]
[1] "i1-apple"      "i2-banana"     "i3-strawberry"

Cheers,

John


 --- "Liaw, Andy" <andy_liaw at merck.com> wrote:

Don't give up on read.table() just yet:

read.table("clipboard", colClasses=c("character",

"NULL", "NULL"),
fill=TRUE)
             V1
1      i1-apple
2     i2-banana
3 i3-strawberry

Andy

From: Spencer Graves

      Uwe and Andy's solutions are great for many 
applications but won't 
work if not all rows have the same numbers of

fields.  Consider for

example the following modification of Lee's

example:

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

      If I copy this to "clipboard" and run Andy's

code, I get the

following:

 > read.table("clipboard",

colClasses=c("character", "NULL", "NULL"))

Error in scan(file = file, what = what, sep = sep,

quote =

quote, dec = 
dec,  :
    line 2 did not have 3 elements

      We can get around this using "scan", then

splitting

things apart 
similar to the way Uwe described:

 > dat <-

+ scan("clipboard", character(0), sep="\n")
Read 3 items

 > dash <- regexpr("-", dat)
 > dat2 <- substring(dat, pmax(0, dash)+1)
 >
 > blank <- regexpr(" ", dat2)
 > if(any(blank<0))

+   blank[blank<0] <- nchar(dat2[blank<0])

 > substring(dat2, 1, blank)

[1] "apple "      "banana"      "strawberry "

      hope this helps.  spencer graves
    
Uwe Ligges wrote:

Liaw, Andy wrote:

Using R-2.0.0 on WinXPPro, cut-and-pasting the

data you have:

read.table("clipboard",

colClasses=c("character", "NULL", "NULL"))


             V1
1      i1-apple
2     i2-banana
3 i3-strawberry



... and if only the words after "-" are of

interest, the

statement can

be followed by

 sapply(strsplit(...., "-"), "[", 2)


Uwe Ligges

HTH,
Andy

From: j lee

Hello All,

I'd like to read first words in lines into a

new file.

If I have a data file the following, how can I

get the

first words: apple, banana, strawberry?

i1-apple        10$   New_York
i2-banana       5$    London
i3-strawberry   7$    Japan

Is there any similar question already posted

to the

list? I am a bit new to R, having a few months

of

experience now.

Cheers,

John

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Reading word by word in a dataset

Thread (3 messages)