-----Original Message-----
From: dwinsemius at comcast.net
Sent: Fri, 25 Jan 2013 13:42:25 -0800
To: tghoward at gw.dec.state.ny.us
Subject: Re: [R] read.csv quotes within fields
On Jan 25, 2013, at 1:37 PM, Tim Howard wrote:
David,
Thank you again for the reply. I'll try to make readLines() and
strplit() work. What bugs me is that I think it would import fine if
the folks who created the csv had used double quotes "" rather than an
escaped quote \" for those pesky internal quotes. Since that's the case,
I'd think there would be a solution within read.csv() ... or perhaps
scan()?, I just can't figure it out.
Can you pre-process with an editor? Replace all the ", " hits with
something like '|'.
--
David.
David Winsemius <dwinsemius at comcast.net> 1/25/2013 4:16 PM >>>
On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:
Great point, your fix (quote="") works for the example I gave.
Unfortunately, these text strings have commas in them as well(!).
Throw a few commas in any of the text strings and it breaks again.
Sorry about not including those in the example.
So, I need to incorporate commas *and* quotes with the escape character
within a single string.
Well you need to have _some_ delimiter. At the moment it sounds as
though you might end upusing readLines() and strsplit( . ,
split="\\'\\,\\s\\").
--
david.
David Winsemius <dwinsemius at comcast.net> 1/25/2013 2:27 PM >>>
On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:
All,
I have some csv files I am trying to import. I am finding that quotes
inside strings are escaped in a way R doesn't expect for csv files.
The problem only seems to rear its ugly head when there are an uneven
number of internal quotes. I'll try to recreate the problem:
# set up a matrix, using escape-quote as the internal double quote
mark.
x <- data.frame(matrix(data=c("1", "string one", "another string",
"2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string",
"3","third row","last \" col"),ncol = 3, byrow=TRUE))
# NOTE that write.csv correctly created the three internal quotes ' "
' by using double quotes ' "" '.
# here's what got written
"","X1","X2","X3"
"1","1","string one","another string"
"2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string"
"3","3","third row","last "" col"
# Importing test.csv works fine.
X X1 X2 X3
1 1 1 string one another string
2 2 2 quotes escaped 10' 20" 5' 30" "test string final string
3 3 3 third row last " col
# this looks good.
# now, please go and open "test.csv" with a text editor and replace
all the double quotes '""' with the
# quote escaped ' \" ' as is found in my data set. Like this:
"","X1","X2","X3"
"1","1","string one","another string"
"2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string"
"3","3","third row","last \" col"
read.csv(text='"","X1","X2","X3"
+ "1","1","string one","another string"
+ "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final
string"
+ "3","3","third row","last "" col"', sep=",", quote="")
Not ...., quote="\""
X.. X.X1. X.X2.
X.X3.
1 "1" "1" "string one" "another
string"
2 "2" "2" "quotes escaped 10' 20"" 5' 30"" ""test string" "final
string"
3 "3" "3" "third row" "last ""
col"
You will then be depending entirely on commas to separate.
(Needed to use escaped single quotes to illustrate from a command
line.)
X X1
X2 X3
1 1 1
string one another string
2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test )
string,final string\n3,3,third row,last \\ col
# we now have only two rows, with all the data captured in col2 row2
Any suggestions on how to fix this behavior? I've tried fiddling with
quote="\"" to no avail, obviously. Interestingly, an even number of
escaped quotes within a field is loaded correctly, which certainly
threw me for a while!
Thank you in advance,
Tim
David Winsemius
Alameda, CA, USA
David Winsemius
Alameda, CA, USA
David Winsemius
Alameda, CA, USA
[[alternative HTML version deleted]]