GDAL.close
On Mon, 4 Nov 2013, Roger Bivand wrote:
Which Windows? XP, Vista, 7, 8, 8.1? 32 and 64 bit? If Vista/7/8, run as
administrator or not? I agree that the code in those parts of rgdal is not
well-designed - it was well-designed, but has been modified so that it works
for most people cross-platform, and has had to accommodate changes that have
taken place in GDAL over more than 10 years, not least the error-handler.
The simple solution to your practical problem is to for you to use a larger
temporary drive under Windows, or change to an operating system that does not
have these side-effects.
Assisting you is not just a matter of doing what you think works for you, but
making sure it doesn't break anything else for anybody else cross-platform.
Your script does not check for other files in tempdir, so I prepended a
listing of prior content:
pc <- dir(tempdir())
and dropped them from the list for unlinking:
now <- dir(tempdir())
unlink(paste(tempdir(), now[!(now %in% pc)], sep=.Platform$file.sep))
I do not see how your script exercises the problem. It creates a new
transient file, but does not close it, which was the behaviour you are
unhappy with. If I add
GDAL.close(r3)
on Linux, the transient dataset is removed. On Windows 7 64-bit with the CRAN
rgdal binary run as user, temporary files are left in tempdir for r1, r2, and
r3. The same three temporary files are left when run as administrator.
The earliest version of GDAL.close was:
GDAL.close <- function(dataset) {
.setCollectorFun(slot(dataset, 'handle'), NULL)
.Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
invisible()
}
with a version in 2007 in the THK branch calling a closeDataset method,
containing:
handle <- slot(dataset, "handle")
unreg.finalizer(handle)
.Call("RGDAL_DeleteHandle", handle, PACKAGE="rgdal")
with:
unreg.finalizer <- function(obj) reg.finalizer(obj, function(x) x)
and by 2010 was:
GDAL.close <- function(dataset) {
.setCollectorFun(slot(dataset, 'handle'), NULL)
.Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
invisible(gc())
}
Special handling of GDALTransientDataset was added in revision 433 in Janual
2013, and modified in revision 462 in April 2013.
It has seemed IIRC that Windows can treat arbitrary files as open. It is also
possible that there is an interaction between Windows and
rgdal:::.setCollectorFun(), which does what it should, when given the NULL
argument, setting:
.setCollectorFun <- function(object, fun) {
if (is.null(fun)) fun <- function(obj) obj
reg.finalizer(object, fun, onexit=TRUE)
}
so incorporating the THK branch logic. It could possibly also vary across
drivers, so finding a robust fix means setting up a test rig with multiple
Windows machines and testing for multiple drivers to see why some temprary
files are being treated as open when other operating systems don't have
problems in their removal. Windows users with too small temporary
directories. I welcome contributions from people who understand Windows and
can actually explain why we see the consequences we see.
One candidate may be to branch to .Call("RGDAL_DeleteHandle", handle,
PACKAGE="rgdal") for the GDALTransientDataset case; I'll report back once the
package has gone through win-builder.
The Windows binary with this modification is at: http://win-builder.r-project.org/TO95cIM24UVL but I do not see that it has altered behaviour under Windows 7 running as user. For some reason Windows sees the transient files as open. Please try other drivers to ensure that this isn't driver-specific. Roger
Hope this doesn't muddle too much, clarification doesn't seem like the right expression. Roger On Sun, 3 Nov 2013, Oliver Soong wrote:
I've been using the CRAN rgdal and raster. I apologize in advance for all
the linebreaks that will be broken. This code should highlight the problem
and the fix:
require(rgdal)
require(raster)
r1 <- raster(system.file("external/test.grd", package="raster"))
r2 <- as(r1, "SpatialGridDataFrame")
r2.dims <- gridparameters(r2)$cells.dim
r3 <- new("GDALTransientDataset", driver = new("GDALDriver", "GTiff"), rows
= r2.dims[2], cols = r2.dims[1], bands = 1, type = "Float32", options =
NULL, fname = file.path(tempdir(), "r3.tif"), handle = NULL)
print(dir(tempdir()))
writeRaster(r1, file.path(tempdir(), "r1.tif"))
writeGDAL(r2, file.path(tempdir(), "r2.tif"))
print(dir(tempdir()))
unlink(dir(tempdir(), full.names = TRUE))
print(dir(tempdir()))
leftover <- gsub("/", "\\\\", dir(tempdir(), full.names = TRUE))
invisible(lapply(paste("cmd /c del", leftover), system))
rm(r1, r2, r3)
gc()
unlink(dir(tempdir(), full.names = TRUE))
print(dir(tempdir()))
invisible(lapply(paste("cmd /c del", leftover), system))
Basically, I'm trying to write a standard raster package raster (r1) and an
sp package SpatialGridDataFrame (r2). Both of those end up calling
new("GDALTransientDataset"), hence r3. At the first print(dir(tempdir())),
only r3 has an open temporary file, which is expected. At the second, all
three have open temporary files, and r1 and r2 have their written final
outputs, which are closed. The temporary files for r1 and r2 should have
been closed at this point. None of the temporary files can be removed by
unlink, although the final outputs can, as shown at the third
print(dir(tempdir())). Windows can't remove them, either. However, if I
remove the GDALTransientDataset r3 and initiate gc(), R can remove that
temporary file, but this does not work for r1 and r2. After q(), the
tempdir() will not be removed by R, but it and the temporary files for r1
and r2 can now be removed.
It looks like GDAL.close is broken (again/as always), but the collector
function for GDALTransientDataset seems to at least close the handle.
GDAL.close relies on RGDAL_CloseDataset, whereas the GDALTransientDataset
collector just uses RGDAL_CloseHandle. With the handle closed, I think the
unlink code in GDAL.close will work (as an aside, I'd use the pattern
paste0("^[a-z]{3}", basen, "$") to be safer and the argument full.names
might be simpler than constructing flf separately). I believe
RGDAL_CloseDataset checks for NULL handles but just returns early, so it
should be the same to replace the .Call("RGDAL_CloseDataset", ...) with
.Call("RGDAL_CloseHandle", ...).
Really, I think RGDAL_DeleteHandle needs to be fixed, but I don't know
enough about GDALDeleteDataset or the #ifndef OSGEO4W deleteFile business
or why RGDAL_CloseHandle is commented out to make any useful suggestions
there.
Cheers,
Oliver
On Fri, Nov 1, 2013 at 1:41 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Mon, 28 Oct 2013, Oliver Soong wrote: I've had a long standing struggle with GDAL.close on Windows, and I
think I might finally have found a fix. I'm currently running rgdal
0.8.11, R 3.0.2, and 32-bit Windows 7.
Currently, writeRaster and writeGDAL create temporary files in the
tempdir() folder (the final filename prefixed with 3 random [a-z]
letters). On my system, these files get left open and orphaned. When
doing heavy processing, this can lead to the drive hosting the
tempdir() folder to become full, even if the data is being ultimately
written to a much larger drive. This also means that R cannot clean
up these files or the tempdir() folder when it closes, causing similar
bloat in my %TEMP%.
I haven't tested this on other platforms, but I think it might help to
insert an extra line into GDAL.close:
.setCollectorFun(slot(dataset, "handle"), NULL)
.Call("RGDAL_CloseHandle", dataset at handle, PACKAGE = "rgdal")
.Call("RGDAL_CloseDataset", dataset, PACKAGE = "rgdal")
For whatever reason, RGDAL_CloseDataset doesn't seem to actually close
the C file handle, but it doesn't seem to mind if the file handle was
closed beforehand.
Could you please provide a working example? I have looked at this, but need a baseline to know whether I'm looking at the same thing. I'm very unsure that this is a robust solution, and need an instrumented example, including listings of the temporary directory during the process, to see the consequences. Thanks for looking into this, but I'd prefer to be sure that a Windows-specific fix doesn't make things worse for others too. Please also report on the source of your Windows rgdal binary - is it from CRAN or locally built dynamically linking your own GDAL? Best wishes, Roger
Cheers, Oliver
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no