Skip to content

tools::md5sum(directory) behavior different on Windows vs. Unix

2 messages · Scott Kostyshak

#
tools::md5sum gives a warning if it receives a directory as an
argument on Unix but not on Windows.
not treated as a file so fopen returns NULL. Then, NA is returned
without a warning. On Unix, a directory is treated as a file so fopen
does not return NULL so md5 is run and fails, leading to a warning.

This is a good opportunity for me to understand further (in addition
to [1] and the many places where OS special cases are mentioned) in
which cases R tries to behave the same on Windows as on Unix and in
which cases it allows for differences (in this case, a warning vs. no
warning). For example, it would be straightforward to create a patch
that would lead to the same behavior in this case. tools::md5sum could
either issue a warning for each argument that is a directory or it
could issue no warning (consistent with file.info). Would either patch
be considered?

Or is this difference encouraged because the concept of a file is
different on Unix than on Windows?

Scott

[1] http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What-should-I-expect-to-behave-differently-from-the-Unix-version


--
Scott Kostyshak
Economics PhD Candidate
Princeton University
20 days later
#
On Mon, Sep 9, 2013 at 3:00 AM, Scott Kostyshak <skostysh at princeton.edu> wrote:
Attached is a patch that gives a warning if an element in the file
argument is not a regular file (e.g. is a directory or does not
exist). In my opinion the advantages of this patch are:

(1) the same warnings are generated on all platforms in the case where
one of the elements is a folder.
(2) a warning is also given if a file does not exist.

Comments?

Scott
-------------- next part --------------
Index: trunk/src/library/tools/R/md5.R
===================================================================
--- trunk/src/library/tools/R/md5.R	(revision 64011)
+++ trunk/src/library/tools/R/md5.R	(working copy)
@@ -17,7 +17,18 @@
 #  http://www.r-project.org/Licenses/
 
 md5sum <- function(files)
-    structure(.Call(Rmd5, files), names=files)
+{
+    reg_ <- file_test("-f", files)
+    regFiles <- files[reg_]
+    notReg <- files[!reg_]
+    if(!all(reg_))
+        warning("The following are not regular files: ",
+                paste(shQuote(notReg), collapse = " "))
+    names(files) <- files
+    files[!reg_] <- NA
+    files[reg_] <- .Call(Rmd5, regFiles)
+    files
+}
 
 .installMD5sums <- function(pkgDir, outDir = pkgDir)
 {
Index: trunk/src/library/tools/man/md5sum.Rd
===================================================================
--- trunk/src/library/tools/man/md5sum.Rd	(revision 64011)
+++ trunk/src/library/tools/man/md5sum.Rd	(working copy)
@@ -18,7 +18,8 @@
 \value{
   A character vector of the same length as \code{files}, with names
   equal to \code{files}. The elements
-  will be \code{NA} for non-existent or unreadable files, otherwise
+  will be \code{NA} for non-existent or unreadable files (in which case
+  a warning will be generated), otherwise
   a 32-character string of hexadecimal digits.
 
   On Windows all files are read in binary mode (as the \code{md5sum}