read.fwf and header - R-devel

Mon, Oct 30, 2006 12:47 PM #

Marc Schwartz wrote:

On Mon, 2006-10-30 at 19:51 +0100, Gregor Gorjanc wrote:

Hi!

I have data (also in attached file) in the following form:

num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
 1                1   f q   1900-01-01 1900-01-01 01:01:01
 2 1.0 1316666.5  2 a g r z            1900-01-01 01:01:01
 3 1.5 1188830.5  3 b h s y 1900-01-01 1900-01-01 01:01:01
 4 2.0 1271846.3  4 c i t x 1900-01-01 1900-01-01 01:01:01
 5 2.5  829737.4    d j u w 1900-01-01
 6 3.0 1240967.3  5 e k v v 1900-01-01 1900-01-01 01:01:01
 7 3.5  919684.4  6 f l w u 1900-01-01 1900-01-01 01:01:01
 8 4.0  968214.6  7 g m x t 1900-01-01 1900-01-01 01:01:01
 9 4.5 1232076.4  8 h n y s 1900-01-01 1900-01-01 01:01:01
10 5.0 1141273.4  9 i o z r 1900-01-01 1900-01-01 01:01:01
   5.5  988481.4 10 j     q 1900-01-01 1900-01-01 01:01:01

This is a FWF (fixed width format) file. I can not use read.table here,
because of missing values. I have tried with the following

read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),

header=TRUE)

Error in read.table(file = FILE, header = header, sep = sep, as.is =
as.is,  :
	more columns than column names

I could use:

read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),

header=FALSE, skip=1)
   V1  V2        V3 V4 V5 V6 V7 V8          V9                 V10
1   1  NA        NA  1    f  q     1900-01-01  1900-01-01 01:01:01
2   2 1.0 1316666.5  2 a  g  r  z              1900-01-01 01:01:01
3   3 1.5 1188830.5  3 b  h  s  y  1900-01-01  1900-01-01 01:01:01
4   4 2.0 1271846.3  4 c  i  t  x  1900-01-01  1900-01-01 01:01:01
5   5 2.5  829737.4 NA d  j  u  w  1900-01-01
6   6 3.0 1240967.3  5 e  k  v  v  1900-01-01  1900-01-01 01:01:01
7   7 3.5  919684.4  6 f  l  w  u  1900-01-01  1900-01-01 01:01:01
8   8 4.0  968214.6  7 g  m  x  t  1900-01-01  1900-01-01 01:01:01
9   9 4.5 1232076.4  8 h  n  y  s  1900-01-01  1900-01-01 01:01:01
10 10 5.0 1141273.4  9 i  o  z  r  1900-01-01  1900-01-01 01:01:01
11 NA 5.5  988481.4 10 j        q  1900-01-01  1900-01-01 01:01:01

Does anyone have a clue, how to get above result with header?

Thanks!

The attachment did not come through. Perhaps it was too large?

Not sure if this is the most efficient way, but how about this:

DF <- read.fwf("test.txt", 
                widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
                skip = 1, strip.white = TRUE,
                col.names = read.table("test.txt", 
                                       nrow = 1, as.is = TRUE)[1, ])

Argh, my fault as I forgot to attach it :(

That is a very nice compromise! No need for [1, ], due to nrow=1.

I fully agree here, but I kind of lack this directly in read.fwf. I hope
that someone from R-core is also listening to this ;)

Thank you!

Gregor
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.txt
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20061030/88560f7c/attachment-0004.txt