Reading a file w/ two delimiters
The thing to watch out for is if you file is large, 'textConnection' is very slow at providing the data stream for something like read.table. It is usually much faster to read in the file with 'readLines', preprocess the data data, write it out to a tempfile and then read it back in with 'read.table'.
On Fri, Nov 18, 2011 at 9:52 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Nov 18, 2011, at 9:13 AM, Langston, Jim wrote:
Thanks Paul, That's the path I was marching down, I was hoping for something a little cleaner, I do the same with Perl or Java.
tesfil <- "aa|bb|cc\tdd|ee|ff\t"
read.table(textConnection(gsub("\\\t", "\n", scan(
? ? ? ? ? ? ? textConnection(tesfil), # substitute your file here ? ? ? ? ? ? ? what="character")) ), sep="|") Read 2 items ?V1 V2 V3 1 aa bb cc 2 dd ee ff
Jim On 11/18/11 8:35 AM, "Paul Hiemstra" <paul.hiemstra at knmi.nl> wrote:
Hi Jim, You can read the text file using readLines. This puts each line in the file into an element of a list. Then you can go through the lines manually (e.g. using grep, sub, strsplit) and create your data.frame. cheers, Paul On 11/18/2011 12:37 PM, Langston, Jim wrote:
Hi all,
I've been scratching and poking, but basically, the file I need to read
has
two delimiters that I need to contend with. The first is that the file
contains
tabs (\t) , instead of newlines (\n), and the second is that the fields
have
| for the seperators. I can easily do a read if I first convert the \t
to
\n
and then use read.table to get the file read with the | separator. But,
what I would really like to do, is do this all within R. I have a lot of
files
to read and do analysis on.
I can read the data into a table using the \t has delimiter, but can't
figure
out how to take that table data and use the | for separation, I've look
at
string splits, etc. but haven't figured out how to split the whole
table.
Any thoughts ? hints ?
Thanks,
Jim
The contents of this e-mail are intended for the named a...{{dropped:6}}
The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it.
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.