Reading word by word in a dataset

Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:
read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
V1
1      i1-apple
2     i2-banana
3 i3-strawberry

HTH,
Andy
From: j lee

Hello All,

I'd like to read first words in lines into a new file.
If I have a data file the following, how can I get the
first words: apple, banana, strawberry?

i1-apple        10$   New_York
i2-banana       5$    London
i3-strawberry   7$    Japan

Is there any similar question already posted to the
list? I am a bit new to R, having a few months of
experience now.

Cheers,

John

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:

read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
             V1
1      i1-apple
2     i2-banana
3 i3-strawberry
... and if only the words after "-" are of interest, the statement can 
be followed by

  sapply(strsplit(...., "-"), "[", 2)

Uwe Ligges
HTH,
Andy

From: j lee

Hello All,

I'd like to read first words in lines into a new file.
If I have a data file the following, how can I get the
first words: apple, banana, strawberry?

i1-apple        10$   New_York
i2-banana       5$    London
i3-strawberry   7$    Japan

Is there any similar question already posted to the
list? I am a bit new to R, having a few months of
experience now.

Cheers,

John

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Uwe and Andy's solutions are great for many applications but won't 
work if not all rows have the same numbers of fields.  Consider for 
example the following modification of Lee's example: 

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

      If I copy this to "clipboard" and run Andy's code, I get the 
following: 

 > read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
dec,  :
    line 2 did not have 3 elements

      We can get around this using "scan", then splitting things apart 
similar to the way Uwe described: 

 > dat <-
+ scan("clipboard", character(0), sep="\n")
Read 3 items
 > dash <- regexpr("-", dat)
 > dat2 <- substring(dat, pmax(0, dash)+1)
 >
 > blank <- regexpr(" ", dat2)
 > if(any(blank<0))
+   blank[blank<0] <- nchar(dat2[blank<0])
 > substring(dat2, 1, blank)
[1] "apple "      "banana"      "strawberry "

      hope this helps.  spencer graves

Liaw, Andy wrote:

Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:

read.table("clipboard", colClasses=c("character", "NULL", "NULL"))

             V1
1      i1-apple
2     i2-banana
3 i3-strawberry

... and if only the words after "-" are of interest, the statement can 
be followed by

 sapply(strsplit(...., "-"), "[", 2)

Uwe Ligges

HTH,
Andy

From: j lee

Hello All,

I'd like to read first words in lines into a new file.
If I have a data file the following, how can I get the
first words: apple, banana, strawberry?

i1-apple        10$   New_York
i2-banana       5$    London
i3-strawberry   7$    Japan

Is there any similar question already posted to the
list? I am a bit new to R, having a few months of
experience now.

Cheers,

John

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
Trying to make it work when not all rows have the same numbers of fields 
seems like a good place to use the "flush" argument to scan() (to skip 
everything after the first field on the line):

With the following copied to the clipboard:

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

do:

 > scan("clipboard", "", flush=T)
Read 3 items
[1] "i1-apple"      "i2-banana"     "i3-strawberry"
 > sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T))
Read 3 items
[1] "apple"      "banana"     "strawberry"
 >

-- Tony Plate
     Uwe and Andy's solutions are great for many applications but won't 
work if not all rows have the same numbers of fields.  Consider for 
example the following modification of Lee's example:
i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

     If I copy this to "clipboard" and run Andy's code, I get the following:
read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
dec,  :
   line 2 did not have 3 elements

     We can get around this using "scan", then splitting things apart 
similar to the way Uwe described:
dat <-
+ scan("clipboard", character(0), sep="\n")
Read 3 items
dash <- regexpr("-", dat)
dat2 <- substring(dat, pmax(0, dash)+1)

blank <- regexpr(" ", dat2)
if(any(blank<0))
+   blank[blank<0] <- nchar(dat2[blank<0])
substring(dat2, 1, blank)
[1] "apple "      "banana"      "strawberry "

     hope this helps.  spencer graves

Uwe Ligges wrote:

Liaw, Andy wrote:

Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:

read.table("clipboard", colClasses=c("character", "NULL", "NULL"))

             V1
1      i1-apple
2     i2-banana
3 i3-strawberry

... and if only the words after "-" are of interest, the statement can be 
followed by

 sapply(strsplit(...., "-"), "[", 2)

Uwe Ligges

HTH,
Andy

From: j lee

Hello All,

I'd like to read first words in lines into a new file.
If I have a data file the following, how can I get the
first words: apple, banana, strawberry?

i1-apple        10$   New_York
i2-banana       5$    London
i3-strawberry   7$    Japan

Is there any similar question already posted to the
list? I am a bit new to R, having a few months of
experience now.

Cheers,

John

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Dear Andy & Tony: 

      That's great.  Unfortunately, I still spend most of my life in the 
S-Plus world, and read.table in S-Plus 6.2 does not have the "fill" 
argument.  However, Tony's solution (and my ugly hack) work in both 
S-Plus 6.2 and R 2.0.0. 

      Thanks again. 
      Spencer Graves

Trying to make it work when not all rows have the same numbers of 
fields seems like a good place to use the "flush" argument to scan() 
(to skip everything after the first field on the line):

With the following copied to the clipboard:

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

do:

scan("clipboard", "", flush=T)
Read 3 items
[1] "i1-apple"      "i2-banana"     "i3-strawberry"
sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T))
Read 3 items
[1] "apple"      "banana"     "strawberry"

-- Tony Plate

At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:

     Uwe and Andy's solutions are great for many applications but 
won't work if not all rows have the same numbers of fields.  Consider 
for example the following modification of Lee's example:
i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

     If I copy this to "clipboard" and run Andy's code, I get the 
following:
read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
Error in scan(file = file, what = what, sep = sep, quote = quote, dec 
= dec,  :
   line 2 did not have 3 elements

     We can get around this using "scan", then splitting things apart 
similar to the way Uwe described:
dat <-
+ scan("clipboard", character(0), sep="\n")
Read 3 items
dash <- regexpr("-", dat)
dat2 <- substring(dat, pmax(0, dash)+1)

blank <- regexpr(" ", dat2)
if(any(blank<0))
+   blank[blank<0] <- nchar(dat2[blank<0])
substring(dat2, 1, blank)
[1] "apple "      "banana"      "strawberry "

     hope this helps.  spencer graves

Uwe Ligges wrote:

Liaw, Andy wrote:

Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:

read.table("clipboard", colClasses=c("character", "NULL", "NULL"))

             V1
1      i1-apple
2     i2-banana
3 i3-strawberry

... and if only the words after "-" are of interest, the statement 
can be followed by

 sapply(strsplit(...., "-"), "[", 2)

Uwe Ligges

HTH,
Andy

From: j lee

Hello All,

I'd like to read first words in lines into a new file.
If I have a data file the following, how can I get the
first words: apple, banana, strawberry?

i1-apple        10$   New_York
i2-banana       5$    London
i3-strawberry   7$    Japan

Is there any similar question already posted to the
list? I am a bit new to R, having a few months of
experience now.

Cheers,

John

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567
Thanks, Tony.
I got a very good idea of using "flush" in scan() from
your reply, so that I successfully did my little job.
But, my next question arises if I want to extract the
list of the price items only in the 2nd column in my
example.
I did it the following way. Is it the right way to do?
Or do you have a smarter or more efficient way to do
it?
system("more mtx.ex.1")
i1-apple 10$ New_York
i2-banana 5$ London
i3-strawberry 7$ Japan
scan(file="mtx.ex.1", what=list(NULL,""),
flush=T)[[2]]
Read 3 records
[1] "10$" "5$"  "7$"

Cheers,

John
Trying to make it work when not all rows have the
same numbers of fields 
seems like a good place to use the "flush" argument
to scan() (to skip 
everything after the first field on the line):

With the following copied to the clipboard:

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

do:

 > scan("clipboard", "", flush=T)
Read 3 items
[1] "i1-apple"      "i2-banana"     "i3-strawberry"
 > sub("^[A-Za-z0-9]*-", "", scan("clipboard", "",
flush=T))
Read 3 items
[1] "apple"      "banana"     "strawberry"
 >
-- Tony Plate

At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
     Uwe and Andy's solutions are great for many
applications but won't 
work if not all rows have the same numbers of
fields.  Consider for 
example the following modification of Lee's
example:
i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

     If I copy this to "clipboard" and run Andy's
code, I get the following:
read.table("clipboard",
colClasses=c("character", "NULL", "NULL"))
Error in scan(file = file, what = what, sep = sep,
quote = quote, dec = 
dec,  :
   line 2 did not have 3 elements

     We can get around this using "scan", then
splitting things apart 
similar to the way Uwe described:
dat <-
+ scan("clipboard", character(0), sep="\n")
Read 3 items
dash <- regexpr("-", dat)
dat2 <- substring(dat, pmax(0, dash)+1)

blank <- regexpr(" ", dat2)
if(any(blank<0))
+   blank[blank<0] <- nchar(dat2[blank<0])
substring(dat2, 1, blank)
[1] "apple "      "banana"      "strawberry "

     hope this helps.  spencer graves

Uwe Ligges wrote:

Liaw, Andy wrote:

Using R-2.0.0 on WinXPPro, cut-and-pasting the
data you have:

read.table("clipboard",
colClasses=c("character", "NULL", "NULL"))

             V1
1      i1-apple
2     i2-banana
3 i3-strawberry

... and if only the words after "-" are of
interest, the statement can be 
followed by

 sapply(strsplit(...., "-"), "[", 2)

Uwe Ligges

HTH,
Andy

From: j lee

Hello All,

I'd like to read first words in lines into a new
file.
If I have a data file the following, how can I
get the
first words: apple, banana, strawberry?

i1-apple        10$   New_York
i2-banana       5$    London
i3-strawberry   7$    Japan

Is there any similar question already posted to
the
list? I am a bit new to R, having a few months
of
experience now.

Cheers,

John

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

--
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html