Skip to content

Strange problem with reading a pipe delimited file

5 messages · Brian Feeny, Duncan Murdoch

#
I am trying to read in a pipe delimited file that has rows with varying number of columns, here is my sample data:

A|B|C|D
A|B|C|D|E|F
A|B|C|D|E
A|B|C|D|E|F|G|H|I
A|B|C|D
A|B|C|D|E|F|G|H|I|J

You can see line 6 has 10 columns.  Yet, I can't explain why R does like so:
V1 V2 V3 V4 V5 V6 V7 V8 V9
1  A  B  C  D               
2  A  B  C  D  E  F         
3  A  B  C  D  E            
4  A  B  C  D  E  F  G  H  I
5  A  B  C  D               
6  A  B  C  D  E  F  G  H  I
7  J                        

You can see it moved "J" to row 7, I don't understand why it is not left in position 6,10.

So, more strange to me, I remove line 1, so my data file contains:

A|B|C|D|E|F
A|B|C|D|E
A|B|C|D|E|F|G|H|I
A|B|C|D
A|B|C|D|E|F|G|H|I|J

and I get a totally different result:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  A  B  C  D  E  F             
2  A  B  C  D  E                
3  A  B  C  D  E  F  G  H  I    
4  A  B  C  D                   
5  A  B  C  D  E  F  G  H  I   J

what it is that I am doing that is changing the fate of that final "J"?  This is just a basic ASCII text file, pipe delimited as shown.

I have been racking my brain on this for a day!

Brian
#
On 12-11-17 4:18 PM, Brian Feeny wrote:
I would suggest reading the help file: read.delim only looks at the 
first 5 lines to determine the number of columns if you don't specify 
the colClasses.

Duncan Murdoch
#
Duncan,

I believe I follow you now, I have done like so with expected results:

ncol <- max(count.fields("paths.txt", sep = "|"))
test <- read.delim("paths.txt", sep="|", quote=NULL, header=F, colClasses="character", fill=TRUE, col.names = paste("V", seq_len(ncol), sep = ""))


Thank you for your help

Brian
On Nov 17, 2012, at 4:34 PM, Brian Feeny wrote:

            
#
On 12-11-17 4:34 PM, Brian Feeny wrote:
Sure, it's warning you that your file looks like it has 9 columns, but 
you said it has 10, and it will not have generated good column names 
(but it will have read the file properly).

If you give it 10 names (e.g. using col.names=LETTERS[1:10]) it will be 
happy.

Duncan Murdoch