More on scan: extra field at end of line

Tue, Dec 26, 2000 9:34 AM

On Tue, 26 Dec 2000, Yves Gauvreau wrote:

1) Yes, in 1.2.0.  I would encourage people to at least try 1.2.0,
not least as 1.2.1 is due out pretty soon and we would like to get the 
maximal number of bugs zapped.  (The PATH problem in rwinst.exe has been
solved in the version now up on CRAN.)

2) On Windows, you will need to do it in rterm: pipe does not work in
Rgui.  That's an OS deficiency that I hope to be able to work around in
time for 1.2.1, but I knew Peter Kleiweg was on HP-UX/Linux.

I suppose in part I was pointing out how neatly some of the pieces we now
have fit together.

Thanks

YG

platform Windows
arch     x86
os       Win32
system   x86, Win32
status
major    1
minor    1.1
year     2000
month    August
day      15
language R

-----Message d'origine-----
De : owner-r-devel@stat.math.ethz.ch
[mailto:owner-r-devel@stat.math.ethz.ch]De la part de Prof Brian Ripley
Envoye : Tuesday, December 26, 2000 9:54 AM
A : Peter Kleiweg
Cc : r-devel@stat.math.ethz.ch
Objet : Re: [Rd] More on scan: extra field at end of line


On Tue, 26 Dec 2000, Peter Kleiweg wrote:

Suppose, I have a file "data1" containing:

    450   390   467   654    30   542   334   432   421
    357   497   493   550   549   467   575   578   342
    446   547   534   495   979   479

I can read this file with:

    scan("data1")
    Read 24 items
     [1] 450 390 467 654  30 542 334 432 421 357 497 493 550

549 467 575 578 342 446

     [20] 547 534 495 979 479

But now, suppose I have a file "data2" containing:

    450, 390, 467, 654,  30, 542, 334, 432, 421,
    357, 497, 493, 550, 549, 467, 575, 578, 342,
    446, 547, 534, 495, 979, 479

When I try to read this with sep="," I get:

    scan("data2", sep=",")
    Read 26 items
     [1] 450 390 467 654  30 542 334 432 421  NA 357 497 493

550 549 467 575 578 342

     [20]  NA 446 547 534 495 979 479

I get two extra fields, both NA. Not what I'd want. And I can't
drop the NA's, because there could be other NA's, not resulting
from this comma-EOL combination.

You can easily remove the trailing commas, though, as in

scan(pipe("sed -e s/,$// data2"), sep=",")
Read 24 items
 [1] 450 390 467 654  30 542 334 432 421 357 497 493 550 549 467 575 578
342 446
[20] 547 534 495 979 479

I suggest, the proper action for scan would be to treat the
combination sep plus newline as a single separator.

However, that's not compatible with S or earlier versions of R or
the documentation

     sep: by default, scan expects to read white-space delimited input
          fields.  Alternatively, `sep' can be used to specify a
          character which delimits fields.  A field is always delimited
          by a newline unless it is quoted.

I suggest the proper action is to act as documented!

--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-.-.-.-.-
r-devel mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._._._._._

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

More on scan: extra field at end of line

Thread (7 messages)