read.fwf doesn't work with header = TRUE (PR#8226)
Prof Brian Ripley wrote:
On Fri, 21 Oct 2005, Emmanuel Paradis wrote:
Prof Brian Ripley wrote:
On Thu, 20 Oct 2005 Emmanuel.Paradis at mpl.ird.fr wrote:
Full_Name: Emmanuel Paradis Version: 2.1.1 OS: Linux Submission from: (NULL) (193.49.41.105) read.fwf(..., header = TRUE) does not work properly since: 1/ the original header is printed on the console and not in FILE; 2/ the different 'parts' of the header should be separated with tabs to work with the call to read.table. Here is a suggested fix for src/library/utils/R/read.fwf.R: 38c38,40 < cat(FILE, headerline, "\n") ---
headerline <- unlist(strsplit(headerline, " {1,}"))
headerline <- paste(headerline, collapse = "\t")
cat(file = FILE, headerline, "\n")
Thanks, but I don't think that is right. It assumes the header line is space-delimited (or at least that spaces get converted to tabs). We have not specified the format of the header line, and it cannot usefully be fixed format. So I think we need to specify it is delimited by 'sep' (not tab).
I see, but suppose we read selectively some columns in a file, eg with widths=c(1, -4, 2), how can we know how many variables have been skipped and then select the appropriate names in the header line?
You do not: as the help file says
Negative-width fields are used to indicate columns to be skipped,
eg '-5' to skip 5 columns. These fields are not seen by
'read.table' and so should not be included in a 'col.names' or
'colClasses' argument.
OK, but it is strange to me to not have all variables named in a header line.
Here is another proposed fix, but this assumes the header line is in fixed-width format (as specified by 'widths'):
What happens if there are multi-line records? Your `fix' crashes.
It crashes anyway because it should be [!drop] and not [drop] ;)
38c38,41 < cat(FILE, headerline, "\n") ---
head.last <- cumsum(widths)
head.first <- head.last - widths + 1
headerline <- substring(headerline, head.first, head.last)[drop]
cat(file = FILE, headerline, "\n", sep = sep)
?read.fwf says clearly that sep is used internally.
Not so: please check the current version.
Here is what I have in R 2.2.0:
sep: character; the separator used internally; should be a
character that does not occur in the file.
So, should the fix be simply:
38c38
< cat(FILE, headerline, "\n")
---
> cat(file = FILE, headerline, "\n")
?