Skip to content

Reading word by word in a dataset

6 messages · Liaw, Andy, Uwe Ligges, Tony Plate +2 more

#
Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:
V1
1      i1-apple
2     i2-banana
3 i3-strawberry

HTH,
Andy
#
Liaw, Andy wrote:

            
... and if only the words after "-" are of interest, the statement can 
be followed by

  sapply(strsplit(...., "-"), "[", 2)


Uwe Ligges
#
Uwe and Andy's solutions are great for many applications but won't 
work if not all rows have the same numbers of fields.  Consider for 
example the following modification of Lee's example: 

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

      If I copy this to "clipboard" and run Andy's code, I get the 
following: 

 > read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
dec,  :
    line 2 did not have 3 elements

      We can get around this using "scan", then splitting things apart 
similar to the way Uwe described: 

 > dat <-
+ scan("clipboard", character(0), sep="\n")
Read 3 items
 > dash <- regexpr("-", dat)
 > dat2 <- substring(dat, pmax(0, dash)+1)
 >
 > blank <- regexpr(" ", dat2)
 > if(any(blank<0))
+   blank[blank<0] <- nchar(dat2[blank<0])
 > substring(dat2, 1, blank)
[1] "apple "      "banana"      "strawberry "

      hope this helps.  spencer graves
Uwe Ligges wrote:

            

  
    
#
Trying to make it work when not all rows have the same numbers of fields 
seems like a good place to use the "flush" argument to scan() (to skip 
everything after the first field on the line):

With the following copied to the clipboard:

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

do:

 > scan("clipboard", "", flush=T)
Read 3 items
[1] "i1-apple"      "i2-banana"     "i3-strawberry"
 > sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T))
Read 3 items
[1] "apple"      "banana"     "strawberry"
 >

-- Tony Plate
At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
#
Dear Andy & Tony: 

      That's great.  Unfortunately, I still spend most of my life in the 
S-Plus world, and read.table in S-Plus 6.2 does not have the "fill" 
argument.  However, Tony's solution (and my ugly hack) work in both 
S-Plus 6.2 and R 2.0.0. 

      Thanks again. 
      Spencer Graves
Tony Plate wrote:

            

  
    
2 days later
#
Thanks, Tony.
I got a very good idea of using "flush" in scan() from
your reply, so that I successfully did my little job.
But, my next question arises if I want to extract the
list of the price items only in the 2nd column in my
example.
I did it the following way. Is it the right way to do?
Or do you have a smarter or more efficient way to do
it?
i1-apple 10$ New_York
i2-banana 5$ London
i3-strawberry 7$ Japan
flush=T)[[2]]
Read 3 records
[1] "10$" "5$"  "7$"

Cheers,

John
--- Tony Plate <tplate at acm.org> wrote: