Reading word by word in a dataset
Thanks, Tony. I got a very good idea of using "flush" in scan() from your reply, so that I successfully did my little job. But, my next question arises if I want to extract the list of the price items only in the 2nd column in my example. I did it the following way. Is it the right way to do? Or do you have a smarter or more efficient way to do it?
system("more mtx.ex.1")
i1-apple 10$ New_York i2-banana 5$ London i3-strawberry 7$ Japan
scan(file="mtx.ex.1", what=list(NULL,""),
flush=T)[[2]] Read 3 records [1] "10$" "5$" "7$" Cheers, John
--- Tony Plate <tplate at acm.org> wrote:
Trying to make it work when not all rows have the same numbers of fields seems like a good place to use the "flush" argument to scan() (to skip everything after the first field on the line): With the following copied to the clipboard: i1-apple 10$ New_York i2-banana i3-strawberry 7$ Japan do:
> scan("clipboard", "", flush=T)
Read 3 items [1] "i1-apple" "i2-banana" "i3-strawberry"
> sub("^[A-Za-z0-9]*-", "", scan("clipboard", "",
flush=T)) Read 3 items [1] "apple" "banana" "strawberry"
>
-- Tony Plate At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
Uwe and Andy's solutions are great for many
applications but won't
work if not all rows have the same numbers of
fields. Consider for
example the following modification of Lee's
example:
i1-apple 10$ New_York
i2-banana
i3-strawberry 7$ Japan
If I copy this to "clipboard" and run Andy's
code, I get the following:
read.table("clipboard",
colClasses=c("character", "NULL", "NULL"))
Error in scan(file = file, what = what, sep = sep,
quote = quote, dec =
dec, :
line 2 did not have 3 elements
We can get around this using "scan", then
splitting things apart
similar to the way Uwe described:
dat <-
+ scan("clipboard", character(0), sep="\n")
Read 3 items
dash <- regexpr("-", dat)
dat2 <- substring(dat, pmax(0, dash)+1)
blank <- regexpr(" ", dat2)
if(any(blank<0))
+ blank[blank<0] <- nchar(dat2[blank<0])
substring(dat2, 1, blank)
[1] "apple " "banana" "strawberry "
hope this helps. spencer graves
Uwe Ligges wrote:
Liaw, Andy wrote:
Using R-2.0.0 on WinXPPro, cut-and-pasting the
data you have:
read.table("clipboard",
colClasses=c("character", "NULL", "NULL"))
V1
1 i1-apple
2 i2-banana
3 i3-strawberry
... and if only the words after "-" are of
interest, the statement can be
followed by sapply(strsplit(...., "-"), "[", 2) Uwe Ligges
HTH, Andy
From: j lee Hello All, I'd like to read first words in lines into a new
file.
If I have a data file the following, how can I
get the
first words: apple, banana, strawberry? i1-apple 10$ New_York i2-banana 5$ London i3-strawberry 7$ Japan Is there any similar question already posted to
the
list? I am a bit new to R, having a few months
of
experience now. Cheers, John
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD, Senior Development Engineer O: (408)938-4420; mobile: (408)655-4567 ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html