Skip to content

misfeature: forced file.copy() of a file over itself truncates the file ...

3 messages · William Dunlap, Ben Bolker

#
Try this:

  fn <- "tmp.dat"
  x <- 1:3
  dump("x",file=fn)
  file.info(fn)  ## 9 bytes
  file.copy(paste("./",fn,sep=""),fn,overwrite=TRUE)
  file.info(fn)  ## 0 bytes (!!)

  Normally file.copy() checks and disallows overwriting a file with
itself, but it only checks whether character string 'from' is the same
as character string 'to' and not whether the copy refers to the same
file by different names, so it lets this go ahead.  It then creates a
new file with the name of 'to' using file.create():

     ?file.create? creates files with the given names if they do not
     already exist and truncates them if they do.

This trashes the existing 'from' file (which was not detected).
file.copy() then happily appends the contents of 'from' (which is now
empty) to 'to' ...

  I don't know whether there's any simple way to fix this, or whether
it's just a case of "don't do that".  It might be worth mentioning in
the documentation:

 `file.copy' will normally refuse to copy a file to itself, but in
cases where the same file is referred to by different names (as in
copying "/full/path/to/filename" to "filename" in the current working
directory), it will truncate the file to zero.

  Now that I write that it really seems like a 'mis-feature'.
  On a Unix system I would probably compare inodes, but I don't know if
there's a good system-independent way to test file identity ...

$ ls -i tmp.dat
114080 tmp.dat
$ ls -i /home/bolker/R/pkgs/r2jags/pkg/tests/tmp.dat
114080 /home/bolker/R/pkgs/r2jags/pkg/tests/tmp.dat

  Would normalizePath() work for this ... ?
[1] "/mnt/hgfs/bolker/Documents/R/pkgs/r2jags/pkg/tests/tmp.dat"

   sincerely
    Ben Bolker
#
Since the problem can only occur if the 'to' file
exists, a check like
   if (normalizePath(from) == normalizePath(to)) {
      stop("'from' and 'to' files are the same")
   }
(after verifying that 'to', and 'from', exist)
would avoid the problem.

S+ has a function, match.path, that can say if two paths refer to
the same file (on Unixen compare inode and device
numbers, on Windows compare the output of normalizePath),
That avoids automounter/NFS problems like the following.

We have a unix machine has two names, "sea-union" and "seabldlnx3201",
and the /nfs directory contains both names.  At the shell (on a
second Linux machine) we can see they refer to the same place:
   % pwd
   /nfs/sea-union
   % ls -id usr /nfs/seabldlnx3201/usr /nfs/sea-union/usr
   358337 /nfs/seabldlnx3201/usr/  358337 /nfs/sea-union/usr/  358337 usr/
   % df usr /nfs/seabldlnx3201/usr /nfs/sea-union/usr
   Filesystem           1K-blocks      Used Available Use% Mounted on
   sea-union:/usr        15385888   3526656  11077664  25% /nfs/sea-union/usr
   seabldlnx3201:/usr    15385888   3526656  11077664  25% /nfs/seabldlnx3201/usr
   sea-union:/usr        15385888   3526656  11077664  25% /nfs/sea-union/usr

S+'s match.path also indicates that they are the same   
   S+> getwd()
   [1] "/nfs/sea-union"
   S+> match.path( c("usr", "/nfs/seabldlnx3201/usr"), "/nfs/sea-union/usr")
   [1] 1 1
   (The last indicates that both paths in the first argument match the
   path in the second, as match() does for strings.)
But R's normalizePath() would lead you to think that they are different
directories
   > getwd()
   [1] "/nfs/sea-union"
   > normalizePath(c("usr", "/nfs/seabldlnx3201/usr", "/nfs/sea-union/usr"))
   [1] "/nfs/sea-union/usr"     "/nfs/seabldlnx3201/usr" "/nfs/sea-union/usr"

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
4 days later
#
Ben Bolker <bbolker <at> gmail.com> writes:
Bump.  Will I be scolded if I submit this as a bug report/wishlist
item?

  Test case:
[snip]

  My proposed fix (thanks to W. Dunlap) is to use normalizePath();
as he points out, this won't catch situations where the same file
can be referred to via an NFS mount, but it should help at least.
Writing a platform-independent version a la S-PLUS's match.path()
seemed to much work at the moment.

===================================================================
--- files.R	(revision 58240)
+++ files.R	(working copy)
@@ -116,7 +116,7 @@
     if(nt > nf) from <- rep(from, length.out = nt)
     okay <- file.exists(from)
     if (!overwrite) okay[file.exists(to)] <- FALSE
-    if (any(from[okay] %in% to[okay]))
+    if (any(normalizePath(from[okay]) %in% normalizePath(to[okay])))
         stop("file can not be copied both 'from' and 'to'")
     if (any(okay)) { # care: file.create could fail but file.append work.
     	okay[okay] <- file.create(to[okay])


  thanks
    Ben Bolker