More on scan: extra field at end of line
On Tue, 26 Dec 2000, Yves Gauvreau wrote:
Hi, I see that Prof Ripley propose to pre-process the file using sed. I saw that to do so he used "pipe". I look for it on my system (see below) and the function doesn't seem to be available. Since I have sed from cygwin32 I wonder if there would a way to use it in a similar fashion as proposed here?
1) Yes, in 1.2.0. I would encourage people to at least try 1.2.0, not least as 1.2.1 is due out pretty soon and we would like to get the maximal number of bugs zapped. (The PATH problem in rwinst.exe has been solved in the version now up on CRAN.) 2) On Windows, you will need to do it in rterm: pipe does not work in Rgui. That's an OS deficiency that I hope to be able to work around in time for 1.2.1, but I knew Peter Kleiweg was on HP-UX/Linux. I suppose in part I was pointing out how neatly some of the pieces we now have fit together.
Thanks YG platform Windows arch x86 os Win32 system x86, Win32 status major 1 minor 1.1 year 2000 month August day 15 language R
-----Message d'origine----- De : owner-r-devel@stat.math.ethz.ch [mailto:owner-r-devel@stat.math.ethz.ch]De la part de Prof Brian Ripley Envoye : Tuesday, December 26, 2000 9:54 AM A : Peter Kleiweg Cc : r-devel@stat.math.ethz.ch Objet : Re: [Rd] More on scan: extra field at end of line On Tue, 26 Dec 2000, Peter Kleiweg wrote:
Suppose, I have a file "data1" containing:
450 390 467 654 30 542 334 432 421
357 497 493 550 549 467 575 578 342
446 547 534 495 979 479
I can read this file with:
scan("data1")
Read 24 items
[1] 450 390 467 654 30 542 334 432 421 357 497 493 550
549 467 575 578 342 446
[20] 547 534 495 979 479
But now, suppose I have a file "data2" containing:
450, 390, 467, 654, 30, 542, 334, 432, 421,
357, 497, 493, 550, 549, 467, 575, 578, 342,
446, 547, 534, 495, 979, 479
When I try to read this with sep="," I get:
scan("data2", sep=",")
Read 26 items
[1] 450 390 467 654 30 542 334 432 421 NA 357 497 493
550 549 467 575 578 342
[20] NA 446 547 534 495 979 479 I get two extra fields, both NA. Not what I'd want. And I can't drop the NA's, because there could be other NA's, not resulting from this comma-EOL combination.
You can easily remove the trailing commas, though, as in
scan(pipe("sed -e s/,$// data2"), sep=",")
Read 24 items
[1] 450 390 467 654 30 542 334 432 421 357 497 493 550 549 467 575 578
342 446
[20] 547 534 495 979 479
I suggest, the proper action for scan would be to treat the combination sep plus newline as a single separator.
However, that's not compatible with S or earlier versions of R or
the documentation
sep: by default, scan expects to read white-space delimited input
fields. Alternatively, `sep' can be used to specify a
character which delimits fields. A field is always delimited
by a newline unless it is quoted.
I suggest the proper action is to act as documented!
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-.-.-.-
r-devel mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._._._._._
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._