R-devel community: I have encountered some unexpected behavior using iconv, which may be the source of errors I am getting when connecting to a UTF-16 -encoded SQL Server database. A simple example is below. When researching this problem, I found r-devel reports of the same problem in threads from June 2010 and February, 2016, and that bug #16738 was posted to Bugzilla as a result. However, I have not been able to determine if the error is mine, if there is a known workaround, or it truly is a bug in R?s iconv implementation. Any additional help is appreciated. Thanks, Michael ?? sessionInfo() #> R version 3.6.1 (2019-07-05). ## and replicated on R 3.4.1 on a cluster running CentOS Linux 7. #> Platform: x86_64-apple-darwin15.6.0 (64-bit) #> Running under: macOS Mojave 10.14.6 # <snip> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> loaded via a namespace (and not attached): #> [1] compiler_3.6.1 s <- "test" iconv(s, to="UTF-8?) #> [1] ?test" iconv(s, to="UTF-16") #> Error in iconv(s, to = "UTF-16"): embedded nul in string: '\xfe\xff\0t\0e\0s\0t? iconv(s, to="UTF-16BE") #> Error in iconv(s, to = "UTF-16BE"): embedded nul in string: '\0t\0e\0s\0t? iconv(s, to="UTF-16LE") #> Error in iconv(s, to = "UTF-16LE"): embedded nul in string: 't\0e\0s\0t\0? -------------------------- Michael Braun, Ph.D. Associate Professor of Marketing, and Corrigan Research Professor Cox School of Business Southern Methodist University Dallas, TX 75275
iconv: embedded nulls when converting to UTF-16
2 messages · Braun, Michael, Duncan Murdoch
1 day later
On 03/08/2019 11:59 p.m., Braun, Michael wrote:
R-devel community: I have encountered some unexpected behavior using iconv, which may be the source of errors I am getting when connecting to a UTF-16 -encoded SQL Server database. A simple example is below. When researching this problem, I found r-devel reports of the same problem in threads from June 2010 and February, 2016, and that bug #16738 was posted to Bugzilla as a result. However, I have not been able to determine if the error is mine, if there is a known workaround, or it truly is a bug in R?s iconv implementation. Any additional help is appreciated.
R does not support embedded nulls in character strings, so it can't handle UTF-16 strings as character vectors. If you are using iconv(), you can set toRaw = TRUE, and you'll get a result containing the correct bytes. For example, > s <- "test" > iconv(s, to="UTF-16",toRaw=TRUE) [[1]] [1] fe ff 00 74 00 65 00 73 00 74 I don't know if SQL Server can handle raw vectors; I'd try to get it to accept UTF-8 input instead. Duncan Murdoch
Thanks, Michael ?? sessionInfo() #> R version 3.6.1 (2019-07-05). ## and replicated on R 3.4.1 on a cluster running CentOS Linux 7. #> Platform: x86_64-apple-darwin15.6.0 (64-bit) #> Running under: macOS Mojave 10.14.6 # <snip> #> locale: #> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> loaded via a namespace (and not attached): #> [1] compiler_3.6.1 s <- "test" iconv(s, to="UTF-8?) #> [1] ?test" iconv(s, to="UTF-16") #> Error in iconv(s, to = "UTF-16"): embedded nul in string: '\xfe\xff\0t\0e\0s\0t? iconv(s, to="UTF-16BE") #> Error in iconv(s, to = "UTF-16BE"): embedded nul in string: '\0t\0e\0s\0t? iconv(s, to="UTF-16LE") #> Error in iconv(s, to = "UTF-16LE"): embedded nul in string: 't\0e\0s\0t\0? -------------------------- Michael Braun, Ph.D. Associate Professor of Marketing, and Corrigan Research Professor Cox School of Business Southern Methodist University Dallas, TX 75275
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel