Message-ID: <C85533C3-506D-4052-8A64-63167211DD74@gmail.com>
Date: 2012-09-08T13:09:49Z
From: Peter Dalgaard
Subject: Can I make spss.get reencode from Windows-1252?
In-Reply-To: <93C38C88-F3C1-489D-9DC8-E75680B52681@fluidmind.org>
On Sep 8, 2012, at 05:17 , Dan Delaney wrote:
> Hi all. I have an SPSS file that I'm loading into R with the Hmisc spss.get function. The trouble is that the SPSS file uses the Windows-1252 character set (which I think is the default for SPSS on Windows) instead of plain-ol' Latin-1, and since spss.get doesn't allow me to pass the "reencode" option to read.spss, any characters in Windows-1252 that are not a part of Latin-1 (such as curly quotes, en-dashes, and a handful of others) come into R looking like this: "Don\x92t know". Now if I read that SPSS file in with read.spss and include "reencode='Windows-1252'", those characters convert to UTF-8 just fine, yielding "Don?t know". But then, of cource, I don't get the niceties of spss.get, such as the "labels" attributes on the columns.
>
> So my question is, how can I either pass the "reencode='Windows-1252'" option through to read.spss, or how can I make spss.get default to reencoding from Windows-1252 instead of Latin-1?
Would it work to do the conversion afterwards?
> iconv("\x92", from="CP1252")
[1] "?"
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com