Message-ID: <31C6DB49-0604-41CD-8D53-5EACF7A46CF5@R-project.org>
Date: 2026-02-19T20:56:11Z
From: Simon Urbanek
Subject: [R-pkg-devel] Using the connections interface to decode text
In-Reply-To: <20260219113519.43ce91ec@Tarkus>
As the author of the custom connection API the answer is no, it was not the intention. The structure has to be exposed in order to implement the connection API for new connections types, there is no way around it (since implementing code need access to the internals), but it should not to be used outside of that context since it should be opaque to the *users* of the connections. So packages that do not implement new connections should not use the internal structures to access the internals of connections, because they are not intended to be part of the public API as they may change. That?s why this is strictly experimental - if we need to change it, only packages implementing new connections would have to adapt, but no one else should. Does that clarify?
Cheers,
Simon
> On 19 Feb 2026, at 21:35, Ivan Krylov via R-package-devel <r-package-devel at r-project.org> wrote:
>
> Hello R package developers,
>
> Now that R_GetConnection(), R_new_custom_connection(),
> R_ReadConnection(), R_WriteConnection() are marked as experimental, I'm
> curious: is it a good idea to use the interface to decode text from a
> user-provided connection? For example, this could be useful to stream
> the data into a parser without loading it all into memory first.
>
> R_ReadConnection() is like readBin(), it won't decode any text. On the
> other hand, since R_new_custom_connection() is also part of the
> interface, this implies that the user must know about struct Rconn and
> what its functions do, including the UTF8out flag and how readLines()
> uses it. (Without the readLines() trick, R will only attempt to decode
> the data into the native encoding. With the readLines() trick, R will
> only accept unopened connections and close them afterwards.)
>
> The following example seems to work:
>
> // R_ExecWithCleanup(), R_CONNECTIONS_VERSION check omitted
> SEXP readFromConn(SEXP sconn) {
> Rconnection conn = R_GetConnection(sconn);
>
> if (!conn->isopen) {
> conn->UTF8out = TRUE;
> strcpy(conn->mode, "rt");
> conn->open(conn);
> }
>
> for (;;) {
> int c = conn->fgetc(conn);
> if (c < 0 || c > 255) break; // R_EOF not declared
> Rprintf("%02x ", c);
> }
>
> Rprintf("\n");
> conn->close(conn);
>
> return R_NilValue;
> }
>
> LC_ALL=en_GB.iso885915 luit R # non-UTF-8 locale
>
> '\u5b98\u8a71' |> iconv('UTF-8', 'GBK') |> writeLines('gbk.txt')
> .Call('readFromConn', file('gbk.txt', encoding = 'GBK'))
> # e5 ae 98 e8 a9 b1 0a
> '\u5b98\u8a71' |> charToRaw() # same UTF-8 as above
> # [1] e5 ae 98 e8 a9 b1
>
> Is it a good idea to adopt such an approach in a package? Would it be
> better to read data as binary and decode it using Riconv?
>
> --
> Best regards,
> Ivan
>
> ______________________________________________
> R-package-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>