An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20040122/4e49aa81/attachment.pl
scan() Bug?
5 messages · Greg Riddick, Brian Ripley, Peter Dalgaard
On Thu, 22 Jan 2004, Greg Riddick wrote:
I'm reading a file into a list by:
PDF = scan("file",what="character",sep="\10")
"\10" is the newline character in this file, also tried "\n" originally
On lines that are ended by "\13\10", both are dropped from the list entry
I want scan to keep the "\13" in the list entry.
Is this a bug or just a strange feature?
Not a strange feature, but the documented behaviour (and useful, too). You have opened the file in text mode. If you want to keep CRs, open and read in binary mode.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
3 days later
Thanks for your suggestions on dealing with binary files, Prof Ripley
I ended up using this method:
PDF = file("file.pdf","a+b")
PDFlines = readLines(PDF)
.
.
.
(Extract Some Information From PDFlines and create some objects to add back
to the PDF file)
.
.
.
writeLines(newobjects, PDF, sep = "\12")
close(PDF)
So I opened the file as binary in read/append mode.
Works fine now...though I have noticed that the sep character that actually
gets written to the file is -2 the value specified.
So I wanted \10 and needed to specify \12 to get it. Am I doing something
wrong here?
I'm working on an R package to add annotations(hyperlinks, popups etc.) to
PDF files that I should release in about 2 weeks. Should be useful
especially to the bioinformatics
people who use R. Incidentally, the uncompressed PDF files that I have seen
R produce are actually just plain text files---human-readable ascii
characters delimited by CR or CR/LF. They are binary only in the sense that
a cross-reference table at the end of the file records byte offsets of
individual objects in the file. So insertions and deletions cannot be made
without updating the
cross-reference table.
----- Original Message -----
From: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>
To: "Greg Riddick" <gr3k at virginia.edu>
Cc: <r-help at stat.math.ethz.ch>
Sent: Thursday, January 22, 2004 4:52 PM
Subject: Re: [R] scan() Bug?
On Thu, 22 Jan 2004, Greg Riddick wrote:
I'm reading a file into a list by:
PDF = scan("file",what="character",sep="\10")
"\10" is the newline character in this file, also tried "\n" originally
On lines that are ended by "\13\10", both are dropped from the list
entry
I want scan to keep the "\13" in the list entry. Is this a bug or just a strange feature?
Not a strange feature, but the documented behaviour (and useful, too). You have opened the file in text mode. If you want to keep CRs, open and read in binary mode. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
The \zzz notation is octal (just like C)! I presume you want ASCII character 10, that is LF, not 8 (BS), although using \n would be much easier to remember.
On Mon, 26 Jan 2004, Greg Riddick wrote:
Thanks for your suggestions on dealing with binary files, Prof Ripley
I ended up using this method:
PDF = file("file.pdf","a+b")
PDFlines = readLines(PDF)
.
.
.
(Extract Some Information From PDFlines and create some objects to add back
to the PDF file)
.
.
.
writeLines(newobjects, PDF, sep = "\12")
close(PDF)
So I opened the file as binary in read/append mode.
Works fine now...though I have noticed that the sep character that actually
gets written to the file is -2 the value specified.
So I wanted \10 and needed to specify \12 to get it. Am I doing something
wrong here?
I'm working on an R package to add annotations(hyperlinks, popups etc.) to
PDF files that I should release in about 2 weeks. Should be useful
especially to the bioinformatics
people who use R. Incidentally, the uncompressed PDF files that I have seen
R produce are actually just plain text files---human-readable ascii
characters delimited by CR or CR/LF. They are binary only in the sense that
a cross-reference table at the end of the file records byte offsets of
individual objects in the file. So insertions and deletions cannot be made
without updating the
cross-reference table.
----- Original Message -----
From: "Prof Brian Ripley" <ripley at stats.ox.ac.uk>
To: "Greg Riddick" <gr3k at virginia.edu>
Cc: <r-help at stat.math.ethz.ch>
Sent: Thursday, January 22, 2004 4:52 PM
Subject: Re: [R] scan() Bug?
On Thu, 22 Jan 2004, Greg Riddick wrote:
I'm reading a file into a list by:
PDF = scan("file",what="character",sep="\10")
"\10" is the newline character in this file, also tried "\n" originally
On lines that are ended by "\13\10", both are dropped from the list
entry
I want scan to keep the "\13" in the list entry. Is this a bug or just a strange feature?
Not a strange feature, but the documented behaviour (and useful, too). You have opened the file in text mode. If you want to keep CRs, open and read in binary mode. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
"Greg Riddick" <gr3k at virginia.edu> writes:
Works fine now...though I have noticed that the sep character that actually gets written to the file is -2 the value specified. So I wanted \10 and needed to specify \12 to get it. Am I doing something wrong here?
Just overlooking that such codes are specified in octal notation, I think.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907