Skip to content

Quickest way to make a large "empty" file on disk?

10 messages · Jonathan Greenberg, Jeff Ryan, Denham Robert +5 more

#
Look at the man page for dd (assuming you are on *nix)

A quick google will get you a command to try. I'm not at my desk or I would as well. 

Jeff

Jeffrey Ryan    |    Founder    |    jeffrey.ryan at lemnica.com

www.lemnica.com
On May 2, 2012, at 5:23 PM, Jonathan Greenberg <jgrn at illinois.edu> wrote:

            
#
Jonathon,
        10,000 numbers is pretty small, so I don't think time will be a
big problem. You could write this using writeBin with no problems. For
larger files, why not just use a loop? The writing is pretty fast, so I
don't think you'll have too many problems. 

On my machine:
user  system elapsed 
  2.416   1.728  16.705 
 
Otherwise I would suggest writing a little piece of c code to do what
you want.

Robert
  

-----Original Message-----
From: r-sig-hpc-bounces at r-project.org
[mailto:r-sig-hpc-bounces at r-project.org] On Behalf Of Jonathan Greenberg
Sent: Thursday, 3 May 2012 8:24 AM
To: r-help; r-sig-hpc at r-project.org
Subject: [R-sig-hpc] Quickest way to make a large "empty" file on disk?

R-helpers:

What would be the absolute fastest way to make a large "empty" file
(e.g.
filled with all zeroes) on disk, given a byte size and a given number
number of empty values.  I know I can use writeBin, but the "object" in
this case may be far too large to store in main memory.  I'm asking
because I'm going to use this file in conjunction with mmap to do
parallel writes to this file.  Say, I want to create a blank file of
10,000 floating point numbers.

Thanks!

--j

--
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science University of
Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html


_______________________________________________
R-sig-hpc mailing list
R-sig-hpc at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-hpc


------------------------------
The information in this email together with any attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. There is no waiver of any confidentiality/privilege by your inadvertent receipt of this material. 
Any form of review, disclosure, modification, distribution and/or publication of this email message is prohibited, unless as a necessary part of Departmental business.
If you have received this message in error, you are asked to inform the sender as quickly as possible and delete this message and any copies of this message from your computer and/or your computer system network.
#
An R solution is:

allocateFile <- function(pathname, nbrOfBytes) {
  con <- file(pathname, open="wb");
  on.exit(close(con));
  seek(con, where=nbrOfBytes-1L, origin="start", rw="write");
  writeBin(as.raw(0), con=con);
  invisible(pathname);
} # allocateFile()
[1] 985403

Note sure if it works on all OSes/file systems.

/Henrik
On Wed, May 2, 2012 at 3:23 PM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
#
Something like:

http://markus.revti.com/2007/06/creating-empty-file-with-specified-size/

Is one way I know of. 

Jeff

Jeffrey Ryan    |    Founder    |    jeffrey.ryan at lemnica.com

www.lemnica.com
On May 2, 2012, at 5:23 PM, Jonathan Greenberg <jgrn at illinois.edu> wrote:

            
#
Hello,

Far from the "absolute fastest" but apparently portable,

big <- function(con, n, pass=5000){
	if(file.exists(con)) unlink(con)
	fc <- file(con, "wb")
	on.exit(close(fc))

	m <- n %/% pass
	r <- n %% pass

	replicate(m, writeBin(double(pass), fc))
	if(r) writeBin(double(r), fc)

	invisible(n)
}


system.time(big("zeros", n=1e7 + 1L))
   user  system elapsed 
   0.07    0.06    0.14

Changing the default 'pass' doesn't make it faster. (On my system, Win7, R
14.2.)

Rui Barradas


--
View this message in context: http://r.789695.n4.nabble.com/Quickest-way-to-make-a-large-empty-file-on-disk-tp4604598p4604690.html
Sent from the R help mailing list archive at Nabble.com.
#
On May 2, 2012, at 6:23 PM, Jonathan Greenberg wrote:

            
The most trivial way is to simply seek to the end and write a byte:
[1] 0
[1] 1e+05

Cheers,
Simon
#
On most UNIX systems this will leave a large unallocated virtual "hole" in the file. If you are not bothered by spreading the allocation task out over the program execution interval, this won't matter and will probably give the best performance.  However, if you wanted to benchmark your algorithms without the erratic filesystem updates mixed in, then you need to write all of those zeroes. For that to work most efficiently, write data in large blocks, and if possible bypass the C standard library.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
Simon Urbanek <simon.urbanek at r-project.org> wrote:

            
#
Jonathan,
   On some filesystems (e.g. NTFS, see below) it is possible to create 'sparse'
   memory-mapped files, i.e. reserving the space without the cost of actually
   writing initial values.
   Package 'ff' does this automatically and also allows to access the file in
   parallel.  Check  the  example  below and see how big file creation is
   immediate.
   Jens Oehlschl?gel
   > library(ff)
   > library(snowfall)
   > ncpus <- 2
   > n <- 1e8
   > system.time(
   + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
   + )
          User      System verstrichen
          0.01        0.00        0.02
   > # check finalizer, with an explicit filename we should have a 'close'
   finalizer
   > finalizer(x)
   [1] "close"
   > # if not, set it to 'close' inorder to not let slaves delete x on slave
   shutdown
   > finalizer(x) <- "close"
   > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
   R Version:  R version 2.15.0 (2012-03-30)
   snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2 CPUs.
   > sfLibrary(ff)
   Library ff loaded.
   Library ff loaded in cluster.
   Warnmeldung:
   In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts =
   TRUE,  :
     'keep.source' is deprecated and will be ignored
   > sfExport("x") # note: do not export the same ff multiple times
   > # explicitely opening avoids a gc problem
   > sfClusterEval(open(x, caching="mmeachflush")) # opening with 'mmeachflush'
   inststead of 'mmnoflush' is a bit slower but prevents OS write storms when
   the file is larger than RAM
   [[1]]
   [1] TRUE
   [[2]]
   [1] TRUE
   > system.time(
   + sfLapply( chunk(x, length=ncpus), function(i){
   +   x[i] <- runif(sum(i))
   +   invisible()
   + })
   + )
          User      System verstrichen
          0.00        0.00       30.78
   > system.time(
   + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i], c(0.05,
   0.95)) )
   + )
          User      System verstrichen
          0.00        0.00        4.38
   > # for completeness
   > sfClusterEval(close(x))
   [[1]]
   [1] TRUE
   [[2]]
   [1] TRUE
   > csummary(s)
                5%  95%
   Min.    0.04998 0.95
   1st Qu. 0.04999 0.95
   Median  0.05001 0.95
   Mean    0.05001 0.95
   3rd Qu. 0.05002 0.95
   Max.    0.05003 0.95
   > # stop slaves
   > sfStop()
   Stopping cluster
   >  # with the close finalizer we are responsible for deleting the file
   explicitely (unless we want to keep it)
   > delete(x)
   [1] TRUE
   > # remove r-side metadata
   > rm(x)
   > # truly free memory
   > gc()
   Gesendet: Donnerstag, 03. Mai 2012 um 00:23 Uhr
   Von: "Jonathan Greenberg" <jgrn at illinois.edu>
   An: r-help <r-help at r-project.org>, r-sig-hpc at r-project.org
   Betreff: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
   R-helpers:
   What would be the absolute fastest way to make a large "empty" file (e.g.
   filled with all zeroes) on disk, given a byte size and a given number
   number of empty values. I know I can use writeBin, but the "object" in
   this case may be far too large to store in main memory. I'm asking because
   I'm going to use this file in conjunction with mmap to do parallel writes
   to this file. Say, I want to create a blank file of 10,000 floating point
   numbers.
   Thanks!
   --j
   --
   Jonathan A. Greenberg, PhD
   Assistant Professor
   Department of Geography and Geographic Information Science
   University of Illinois at Urbana-Champaign
   607 South Mathews Avenue, MC 150
   Urbana, IL 61801
   Phone: 415-763-5476
   AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
   [1]http://www.geog.illinois.edu/people/JonathanGreenberg.html
   _______________________________________________
   R-sig-hpc mailing list
   R-sig-hpc at r-project.org
   [2]https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

References

   1. http://www.geog.illinois.edu/people/JonathanGreenberg.html
   2. https://stat.ethz.ch/mailman/listinfo/r-sig-hpc