Skip to content
Prev 306560 / 398502 Next

Quickest way to make a large "empty" file on disk?

Hello,

I've written a function to try to answer to your op request, but I've 
run into a problem. See in the end.
In the mean time, inline.
Em 28-09-2012 17:44, Jonathan Greenberg escreveu:
Nothing special, just that sometimes there are good ways of doing so. 
mmap seems to be safe.
I'm not a great system programmer but in 20+ years of using seek on 
Windows has shown nothing of the sort. In fact, I've just found a 
problem with ubuntu 12.04, where seek gives the expected result on 
Windows, it goes up to a certain point on ubuntu and then "stops 
seeking", or whatever is happening. I installed ubuntu very recently so 
I really don't know why the behavior that you can see in the example run 
below. But I do that Windows 7 is causing no problem, as expected.
#
# Function: creates a file of ascii nulls using seek/writeBin. File size 
can be big.
#
createBig <- function(filename, size){
     if(size == 0) return(0)
     chunk <- .Machine$integer.max
     nchunks <- as.integer(size / chunk)
     rest <- size - as.double(nchunks)*as.double(chunk)
     fl <- file(filename, open = "wb")
     for(i in seq_len(nchunks)){
         seek(fl, where = chunk - 1, origin = "current", rw = "write")
         writeBin(raw(1), fl)
         # ---------- debug ----------
         print(seek(fl, where = NA))
     }
     if(rest > 0){
         seek(fl, where = rest - 1, origin = "current", rw = "write")
         writeBin(raw(1), fl)
     }
     close(fl)
}

As you can see from the debug prints, on Windows 7,  everything works as 
planned while on ubuntu 12.04 when it reaches 17Gb seek stops seeking. 
The increments in file size become 1 byte at a time, explained by the 
writeBin instruction. (The different, slightly larger, size is 
irrelevant, the code was ran several times all with the same result:  at 
17179869176 bytes it no longer works.)

#----------------------------------------------------------------------------
#
# System: Windows 7 / R 2.15.1

size <- 10*.Machine$integer.max + sample(.Machine$integer.max, 1)
size
[1] 22195364413

createBig("Test.txt", size)
[1] 2147483647
[1] 4294967294
[1] 6442450941
[1] 8589934588
[1] 10737418235
[1] 12884901882
[1] 15032385529
[1] 17179869176
[1] 19327352823
[1] 21474836470

file.info("Test.txt")$size
[1] 22195364413
file.info("Test.txt")$size %/% .Machine$integer.max
[1] 10
file.info("Test.txt")$size %% .Machine$integer.max
[1] 720527943

sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

loaded via a namespace (and not attached):
[1] fortunes_1.5-0

#----------------------------------------------------------------------------
#
# System: ubuntu 12.04 precise pangolim / R 2.15.1
size <- 10*.Machine$integer.max + sample(.Machine$integer.max, 1)
size
[1] 23091487381

createBig("Test.txt", size)
[1] 2147483647
[1] 4294967294
[1] 6442450941
[1] 8589934588
[1] 10737418235
[1] 12884901882
[1] 15032385529
[1] 17179869176
[1] 17179869177
[1] 17179869178

file.info("Test.txt")$size
[1] 17179869179
file.info("Test.txt")$size %/% .Machine$integer.max
[1] 8
file.info("Test.txt")$size %% .Machine$integer.max
[1] 3


sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=pt_PT.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=pt_PT.UTF-8        LC_COLLATE=pt_PT.UTF-8
  [5] LC_MONETARY=pt_PT.UTF-8    LC_MESSAGES=pt_PT.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
[7] base

loaded via a namespace (and not attached):
[1] tools_2.15.1