Skip to content

Bug (?): reading binary files in Windows 10

6 messages · Kate Stone, Albrecht Kauffmann, Omar André Gonzáles Díaz +2 more

#
Hello r-help,

Could you help me determine whether this is an R bug or not?

I've been trying to read this binary file in R:

download.file("ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/preprocessing_erp/s04.eeg","s04.eeg")

and I get a different length file (i.e. much longer) in Windows  >= 8
x64 (build 9200) than in Ubuntu. I've tested it with different R
versions in Windows and different package versions with the same
incorrect result. Other colleagues have tested it on the same
Windows/Ubuntu builds and got the correct length.

I'm not sure whether this is an R problem or something to do with my
OS specifically, or even with the file itself. Any ideas?? I've
attached a small script demonstrating the issue.

Many thanks,
Kate
#
Dear Kate,

I cannot find your small script, but I downloaded the file using your command line. It has the size of  142773760 bytes (136.2 MB).

Hth,
Albrecht
#
Hi,

this is what i got, just with base R:
ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/preprocessing_erp/s04.eeg
","s04.eeg")
probando la URL '
ftp://ftp.fieldtriptoolbox.org/pub/fieldtrip/tutorial/preprocessing_erp/s04.eeg
'
Content type 'unknown' length 142773760 bytes (136.2 MB)
==================================================
[1] 0
[1] 1

Information about the session:
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4    yaml_2.2.0


El jue., 6 dic. 2018 a las 8:51, Albrecht Kauffmann (<alkauffm at fastmail.fm>)
escribi?:

  
  
#
On 06/12/2018 7:45 AM, Kate Stone wrote:
On Windows, the `mode = "wb"` argument to download.file() is important, 
otherwise it is assumed to be a text file, and LF is changed to CR LF. 
There may also be handling of EOF marks, I forget.

Duncan Murdoch
#
AFAIK this receiver-side responsibility to specify the text/binary status of the file is particularly a problem with the "ftp://" protocol because it does not use MIME file encoding (which "http://" uses). MIME allows the sending end of the connection to communicate whether the file is text or binary, though it uses more bandwidth for the transfer. If the server offers you a choice in these days of high bandwidth connections, you may be better off sticking with http/https.

Note that MIME is not magic... if the sender is improperly configured then the client can potentially receive corrupt data. Fortunately the most typical MIME misconfigurations cause the file to be unchanged in all cases, leaving it to the receiver to deal with any text file newline decoding choice/task after the file transfer is completed.
On December 6, 2018 7:03:48 AM PST, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:

  
    
#
Ah wow, that answers many questions, thanks!

On Thu, Dec 6, 2018 at 4:41 PM Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote: