Skip to content

what do you think about write.table(... qmethod = "excel")?

6 messages · Paul Johnson, Dirk Eddelbuettel, Ista Zahn +2 more

#
Last week one of our clients reported trouble with a csv file I
generated with write.table.  He said that columns with quotes for
character variables were rejected by their data importer, which was
revised to match the way Microsoft Excel uses quotation marks in
character variables.  I explained to them that quoted character
variables are virtuous and wise, of course, but they say Microsoft
Excel CSV export no longer quotes characters unless they include
commas in the values.

They showed me a CSV file from Excel that looked like this

x1,x2,x3,x4 5 6
fred,barney,betty,x
bambam,"fred,wilma",pebbles,y

Note how the quotes only happen on row 2 column 2. I was surprised it
did that, but now I have some pressure to write a csv maker that has
that structure.  Its weird, even when there are spaces in values there
are no quotation marks.

Has anybody done this and verified that it matches CSV from MS Excel?
If I succeed will you consider a patch?

pj
#
On 19 September 2017 at 12:04, Paul Johnson wrote:
| They showed me a CSV file from Excel that looked like this
| 
| x1,x2,x3,x4 5 6
| fred,barney,betty,x
| bambam,"fred,wilma",pebbles,y
| 
| Note how the quotes only happen on row 2 column 2. I was surprised it
| did that, but now I have some pressure to write a csv maker that has
| that structure.  Its weird, even when there are spaces in values there
| are no quotation marks.
| 
| Has anybody done this and verified that it matches CSV from MS Excel?
| If I succeed will you consider a patch?

R> data.table::fread("/tmp/paul.csv")
       x1         x2      x3 x4 5 6
1:   fred     barney   betty      x
2: bambam fred,wilma pebbles      y
R> data.table::fread("/tmp/paul.csv")
       x1         x2      x3 x4
1:   fred     barney   betty  x
2: bambam fred,wilma pebbles  y
R> 

The only difference is that between calls one and two, I removed the stray
    "5 6"
from the first line.

Dirk
#
On Tue, Sep 19, 2017 at 1:04 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
I think you should resist that pressure. It really makes no sense to
write a .csv parser that _only_ supports .csv files created by Excel.
If you're going to use Excel as a model, a more sensible approach
would be to write a csv parser that supports all the formats that
Excel itself supports; Excel of course has no problem importing

"x1","x2","x3","x4"
"fred","barney","betty","x"
"bambam","fred,wilma","pebbles","y"

So, seriously, tell them to just fix their csv parser. Since they seem
hung up on Excel, it may help to point out that it does in fact import
csv produced by write.csv without complaint.

Best,
Ista

 Its weird, even when there are spaces in values there
#
On 19/09/2017 4:10 PM, Ista Zahn wrote:
That depends on whether this is a paying client or not.

 > It really makes no sense to
That's true, but if that's what they want to do, and they're willing to 
pay to be able to write files that imitate Excel, then why not do what 
they ask?

On the other hand, if they aren't willing to pay for the work, then you 
should lecture them on how silly their request is.

In any case, base R functions should not include nonsense, so this is 
not something that should go into R.

Duncan Murdoch
#
On Tue, Sep 19, 2017 at 4:45 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
I understand.  This is a paying client, I'm going where the money goes.

Here's my current working example of a function that writes a CSV
exactly as Excel does. I've posted this into StackOverflow
(https://stackoverflow.com/questions/25743018/how-to-conditionally-remove-quotes-in-write-csv).
It is buried under a thread, maybe will not get much attention and
bug-checking. Oh, well, I can hope. Processes variables
column-by-column, I don't know how to do it differently.



##' Write CSV files with quotes same as MS Excel 2013 or newer
##'
##' R's write.csv inserts quotes around all elements in a character
##' vector (if quote = TRUE).  In contrast, MS Excel CSV export no
##' longer inserts quotation marks on all elements in character
##' variables, except when the cells include commas or quotation
##' marks.  This function generates CSV files that are, so far as we
##' know, exactly the same "quoted style" as MS Excel CSV export
##' files.
##'
##' This works by manually inserting quotation marks where necessary and
##' turning FALSE R's own method to insert quotation marks.
##' @param x a data frame
##' @param file character string for file name
##' @param row.names Default FALSE for row.names
##' @importFrom utils write.table
##' @return the return from write.table, using revised quotes
##' @export
##' @author Paul Johnson
##' @examples
##' set.seed(234)
##' x1 <- data.frame(x1 = c("a", "b,c", "b", "The \"Washington, DC\""),
##'       x2 = rnorm(4), stringsAsFactors = FALSE)
##' x1
##' fn <- tempfile(pattern = "testcsv", fileext = ".csv")
##' writeCSV(x1, file = fn)
##' readLines(fn)
##' x2 <- read.table(fn, sep = ",", header = TRUE, stringsAsFactors = FALSE)
##' all.equal(x1,x2)
writeCSV <- function(x, file, row.names = FALSE){
    xischar <- colnames(x)[sapply(x, is.character)]
    for(jj in xischar){
        x[ , jj] <- gsub('"', '""', x[ , jj], fixed = TRUE)
        needsquotes <- grep('[\",]', x[ ,jj])
        x[needsquotes, jj] <- paste0("\"", x[needsquotes, jj], "\"")
    }
    write.table(x, file = file, sep = ",", quote = FALSE,
                row.names = row.names)
}

Output:
+        x2 = rnorm(4), stringsAsFactors = FALSE)
x1         x2
1                    a  0.6607697
2                  b,c -2.0529830
3                    b -1.4992061
4 The "Washington, DC"  1.4712331
[1] "x1,x2"
[2] "a,0.660769736644892"
[3] "\"b,c\",-2.052983003941"
[4] "b,-1.49920605110092"
[5] "\"The \"\"Washington, DC\"\"\",1.4712331168047"
[1] TRUE

I already see one problem, that I've got no special arrangement for
column names with commas or quotes. People who want column names with
those things are even more wrong than the people write a parser that
can't understand quotes on character variables.
#
? Should that be:


    xIsCharOrFactor <- colnames(x)[sapply(x, function(z) is.character(z) | is.factor(z) )]

Just asking. No testing done.
__ 
David.
David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law