Skip to content

read.csv quotes within fields

11 messages · Tim Howard, David Winsemius, Duncan Murdoch +2 more

#
All,
 
I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem:
 
# set up a matrix, using escape-quote as the internal double quote mark.
 
x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE))
# NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '. 
# here's what got written
 
"","X1","X2","X3"
"1","1","string one","another string"
"2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string"
"3","3","third row","last "" col"
 
# Importing test.csv works fine.
X X1                                         X2             X3
1 1  1                                 string one another string
2 2  2 quotes escaped 10' 20" 5' 30" "test string   final string
3 3  3                                  third row     last " col
# this looks good. 
# now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the 
# quote escaped ' \" ' as is found in my data set. Like this:

"","X1","X2","X3"
"1","1","string one","another string"
"2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string"
"3","3","third row","last \" col"
 
# this breaks read.csv:
X X1                                                                                    X2             X3
1 1  1                                                                            string one another string
2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col      
 
# we now have only two rows, with all the data captured in col2 row2
 
Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while!
 
Thank you in advance, 
Tim
#
On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:

            
Use quote="":
+ "1","1","string one","another string"
+ "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final string"
+ "3","3","third row","last "" col"', sep=",", quote="")

Not ...., quote="\""


  X.. X.X1.                                           X.X2.            X.X3.
1 "1"   "1"                                    "string one" "another string"
2 "2"   "2" "quotes escaped 10' 20"" 5' 30"" ""test string"   "final string"
3 "3"   "3"                                     "third row"    "last "" col"

You will then be depending entirely on commas to separate. 

(Needed to use escaped single quotes to illustrate from a command line.)
David Winsemius
Alameda, CA, USA
#
Great point, your fix (quote="") works for the example I gave. Unfortunately, these text strings have commas in them as well(!).  Throw a few commas in any of the text strings and it breaks again.  Sorry about not including those in the example.
 
So, I need to incorporate commas *and* quotes with the escape character within a single string.
 
Tim

        
On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:

            
Use quote="":
+ "1","1","string one","another string"
+ "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final string"
+ "3","3","third row","last "" col"', sep=",", quote="")

Not ...., quote="\""


  X.. X.X1.                                           X.X2.            X.X3.
1 "1"   "1"                                    "string one" "another string"
2 "2"   "2" "quotes escaped 10' 20"" 5' 30"" ""test string"   "final string"
3 "3"   "3"                                     "third row"    "last "" col"

You will then be depending entirely on commas to separate. 

(Needed to use escaped single quotes to illustrate from a command line.)
David Winsemius
Alameda, CA, USA
#
On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:

            
Well you need to have _some_ delimiter. At the moment it sounds as though you might end upusing readLines() and strsplit( . , split="\\'\\,\\s\\").
#
David, 
Thank you again for the reply. I'll try to make readLines() and strplit() work.  What bugs me is that I think it would import fine if the folks who created the csv had used double quotes "" rather than an escaped quote \" for those pesky internal quotes. Since that's the case, I'd think there would be a solution within read.csv() ... or perhaps scan()?, I just can't figure it out. 
best, 
Tim

        
On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:

            
Well you need to have _some_ delimiter. At the moment it sounds as though you might end upusing readLines() and strsplit( . , split="\\'\\,\\s\\").
#
On 13-01-25 4:37 PM, Tim Howard wrote:
What you say doesn't make sense.  Let me paraphrase:

"The folks who sent me the data created a weird, non-standard .csv file. 
  You'd think read.csv() could handle it."

The standard way to handle quotes in strings is to double them.  They 
didn't, so you've got a weird file on your hands.

You can probably fix it by doing a global substitution of all "backslash 
doublequote" pairs with "doublequote doublequote".  Unless they have 
some "backslash backslash doublequote" triples that they want you to 
interpret as "backslash doublequote".  Or some other weirdness.

I'd suggest you try the global subst mentioned above. or ask them for 
data in a reasonably standardized format.

Duncan Murdoch
#
Following David's suggestion you might want to have a look at https://confluence.clazzes.org/display/CSVEDIT/CSVEdit+Home .

 I have not used it but it seems to get good reviews from people I know.

John Kane
Kingston ON Canada
____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
#
Duncan, 
Good point - I guess I am expecting too much. I'll work on a global replace before import or chopping with strsplit while importing. 

FYI everyone - these folks with the non-standard csv are the US Feds:
https://oeaaa.faa.gov/oeaaa/external/public/publicAction.jsp?action=showCaseDownloadForm

(see off-airport AEA 2005 for a csv with issues.)

Thank you for the help. I really do appreciate the suggested solutions.
Also, thanks to John Kane for the link to csvEdit. 

Tim
On 13-01-25 4:37 PM, Tim Howard wrote:
What you say doesn't make sense.  Let me paraphrase:

"The folks who sent me the data created a weird, non-standard .csv file. 
  You'd think read.csv() could handle it."

The standard way to handle quotes in strings is to double them.  They 
didn't, so you've got a weird file on your hands.

You can probably fix it by doing a global substitution of all "backslash 
doublequote" pairs with "doublequote doublequote".  Unless they have 
some "backslash backslash doublequote" triples that they want you to 
interpret as "backslash doublequote".  Or some other weirdness.

I'd suggest you try the global subst mentioned above. or ask them for 
data in a reasonably standardized format.

Duncan Murdoch
#
On Jan 26, 2013, at 16:32 , Tim Howard wrote:

            
Does this do the trick?

dd <- read.csv(text=gsub("\\\"", "\"\"", fixed=TRUE,
   readLines("~/Downloads/OffAirportAEA2005List.csv")))

  
    
1 day later
#
Yes!  This does this trick. Thank You!
Tim

        
On Jan 26, 2013, at 16:32 , Tim Howard wrote:

            
Does this do the trick?

dd <- read.csv(text=gsub("\\\"", "\"\"", fixed=TRUE,
   readLines("~/Downloads/OffAirportAEA2005List.csv")))