GDAL.close
On Tue, 5 Nov 2013, Oliver Soong wrote:
Hi Roger, Toggling on GDALTransientDataset in GDAL.close doesn't change anything because RGDAL_CloseDataset already toggles on that. In either case, RGDAL_DeleteHandle still doesn't work. Also, the gc at the end doesn't do anything useful because .setCollectorFun has already trivialized the finalizer and because dataset still exists at that point, as does the source dataset given to GDAL.close in the first place. The HFA driver has the exact same issue. From what I see in the code, I don't think it's related to the specific driver.
Here is the next version: http://win-builder.r-project.org/0X0318s8iW0C with: .setCollectorFun(slot(dataset, 'handle'), NULL) .Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal") .Call("RGDAL_CloseHandle", slot(dataset, 'handle'), PACKAGE="rgdal") Running under Windows, and using: http://download.sysinternals.com/files/Handle.zip to check, the three transient datasets are open: 3B8: File (RW-) C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\awlr2.tif 3C0: File (RW-) C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\tfur1.tif 498: File (RW-) C:\Users\rsb\AppData\Local\Temp\Rtmp8kZk0N\mxbr3.tif I'm assuming that RW- means read and write open, the third character is D, which is probably directory. Roger
Cheers, Oliver On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com> wrote:
Er, just saw your recent e-mail. I'll take a look. Oliver On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com> wrote:
I do think we've gotten a bit muddled. I'm probably not helping, but I'll do my best. Windows XP (32-bit), 7 (32-bit), and 2008 R2 (32-bit and 64-bit), R-3.0.2, sp 1.0.13, rgdal 0.8.11, raster 2.1.49. The main problem as seen by end-users is the orphaned temporary files that you observed were left over for r1 and r2. They cannot be removed while R is running. I seem not to have explained r3 very well, but suffice it to say that GDAL.close(r3) creates similar orphaned temporary files, indicating GDAL.close is not functioning properly on GDALTransientDataset objects under Windows. I think RGDAL_DeleteHandle (and hence RGDAL_CloseDataset) is the root of the problem, and I think it's not properly closing the file handle before trying and failing to delete the associated files. Windows automatically locks open file handles, but linux requires extra steps that are not always done and are not always respected, which is probably why this isn't apparent under linux. The finalizer code and resetting seems appropriate. I still hesitate to say much about RGDAL_DeleteHandle, but I will point out that one is normally supposed to close the file handle before deleting the file, and it seems backwards in RGDAL_DeleteHandle. I don't know if that is intentional. After thinking a little more, I think it's better to switch the calls to RGDAL_CloseHandle and RGDAL_CloseDataset that I suggested originally for GDAL.close. That means simply adding the call to RGDAL_CloseHandle after the call to RGDAL_CloseDataset, rather than before. With this code, if RGDAL_CloseDataset behaves as intended, RGDAL_CloseHandle will get a nil pointer and won't do anything. However, if RGDAL_CloseDataset fails to function properly, RGDAL_CloseHandle will close the open handle and the if(isTrans) cleanup code already in GDAL.close will operate. Perhaps I could be more helpful if you explained what you thought my suggested change might break? This last one (the existing RGDAL_CloseDataset followed by an additional RGDAL_CloseHandle) should be no worse than the current code. Is that at all clearer? Oliver On Mon, Nov 4, 2013 at 1:03 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
Which Windows? XP, Vista, 7, 8, 8.1? 32 and 64 bit? If Vista/7/8, run as
administrator or not? I agree that the code in those parts of rgdal is not
well-designed - it was well-designed, but has been modified so that it
works for most people cross-platform, and has had to accommodate changes
that have taken place in GDAL over more than 10 years, not least the
error-handler.
The simple solution to your practical problem is to for you to use a
larger temporary drive under Windows, or change to an operating system that
does not have these side-effects.
Assisting you is not just a matter of doing what you think works for
you, but making sure it doesn't break anything else for anybody else
cross-platform.
Your script does not check for other files in tempdir, so I prepended a
listing of prior content:
pc <- dir(tempdir())
and dropped them from the list for unlinking:
now <- dir(tempdir())
unlink(paste(tempdir(), now[!(now %in% pc)], sep=.Platform$file.sep))
I do not see how your script exercises the problem. It creates a new
transient file, but does not close it, which was the behaviour you are
unhappy with. If I add
GDAL.close(r3)
on Linux, the transient dataset is removed. On Windows 7 64-bit with the
CRAN rgdal binary run as user, temporary files are left in tempdir for r1,
r2, and r3. The same three temporary files are left when run as
administrator.
The earliest version of GDAL.close was:
GDAL.close <- function(dataset) {
.setCollectorFun(slot(dataset, 'handle'), NULL)
.Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
invisible()
}
with a version in 2007 in the THK branch calling a closeDataset method,
containing:
handle <- slot(dataset, "handle")
unreg.finalizer(handle)
.Call("RGDAL_DeleteHandle", handle, PACKAGE="rgdal")
with:
unreg.finalizer <- function(obj) reg.finalizer(obj, function(x) x)
and by 2010 was:
GDAL.close <- function(dataset) {
.setCollectorFun(slot(dataset, 'handle'), NULL)
.Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
invisible(gc())
}
Special handling of GDALTransientDataset was added in revision 433 in
Janual 2013, and modified in revision 462 in April 2013.
It has seemed IIRC that Windows can treat arbitrary files as open. It is
also possible that there is an interaction between Windows and
rgdal:::.setCollectorFun(), which does what it should, when given the NULL
argument, setting:
.setCollectorFun <- function(object, fun) {
if (is.null(fun)) fun <- function(obj) obj
reg.finalizer(object, fun, onexit=TRUE)
}
so incorporating the THK branch logic. It could possibly also vary
across drivers, so finding a robust fix means setting up a test rig with
multiple Windows machines and testing for multiple drivers to see why some
temprary files are being treated as open when other operating systems don't
have problems in their removal. Windows users with too small temporary
directories. I welcome contributions from people who understand Windows and
can actually explain why we see the consequences we see.
One candidate may be to branch to .Call("RGDAL_DeleteHandle", handle,
PACKAGE="rgdal") for the GDALTransientDataset case; I'll report back once
the package has gone through win-builder.
Hope this doesn't muddle too much, clarification doesn't seem like the
right expression.
Roger
On Sun, 3 Nov 2013, Oliver Soong wrote:
I've been using the CRAN rgdal and raster. I apologize in advance for
all
the linebreaks that will be broken. This code should highlight the
problem
and the fix:
require(rgdal)
require(raster)
r1 <- raster(system.file("external/test.grd", package="raster"))
r2 <- as(r1, "SpatialGridDataFrame")
r2.dims <- gridparameters(r2)$cells.dim
r3 <- new("GDALTransientDataset", driver = new("GDALDriver", "GTiff"),
rows
= r2.dims[2], cols = r2.dims[1], bands = 1, type = "Float32", options =
NULL, fname = file.path(tempdir(), "r3.tif"), handle = NULL)
print(dir(tempdir()))
writeRaster(r1, file.path(tempdir(), "r1.tif"))
writeGDAL(r2, file.path(tempdir(), "r2.tif"))
print(dir(tempdir()))
unlink(dir(tempdir(), full.names = TRUE))
print(dir(tempdir()))
leftover <- gsub("/", "\\\\", dir(tempdir(), full.names = TRUE))
invisible(lapply(paste("cmd /c del", leftover), system))
rm(r1, r2, r3)
gc()
unlink(dir(tempdir(), full.names = TRUE))
print(dir(tempdir()))
invisible(lapply(paste("cmd /c del", leftover), system))
Basically, I'm trying to write a standard raster package raster (r1)
and an
sp package SpatialGridDataFrame (r2). Both of those end up calling
new("GDALTransientDataset"), hence r3. At the first
print(dir(tempdir())),
only r3 has an open temporary file, which is expected. At the second,
all
three have open temporary files, and r1 and r2 have their written final
outputs, which are closed. The temporary files for r1 and r2 should
have
been closed at this point. None of the temporary files can be removed
by
unlink, although the final outputs can, as shown at the third
print(dir(tempdir())). Windows can't remove them, either. However, if
I
remove the GDALTransientDataset r3 and initiate gc(), R can remove that
temporary file, but this does not work for r1 and r2. After q(), the
tempdir() will not be removed by R, but it and the temporary files for
r1
and r2 can now be removed.
It looks like GDAL.close is broken (again/as always), but the collector
function for GDALTransientDataset seems to at least close the handle.
GDAL.close relies on RGDAL_CloseDataset, whereas the
GDALTransientDataset
collector just uses RGDAL_CloseHandle. With the handle closed, I think
the
unlink code in GDAL.close will work (as an aside, I'd use the pattern
paste0("^[a-z]{3}", basen, "$") to be safer and the argument full.names
might be simpler than constructing flf separately). I believe
RGDAL_CloseDataset checks for NULL handles but just returns early, so it
should be the same to replace the .Call("RGDAL_CloseDataset", ...) with
.Call("RGDAL_CloseHandle", ...).
Really, I think RGDAL_DeleteHandle needs to be fixed, but I don't know
enough about GDALDeleteDataset or the #ifndef OSGEO4W deleteFile
business
or why RGDAL_CloseHandle is commented out to make any useful suggestions
there.
Cheers,
Oliver
On Fri, Nov 1, 2013 at 1:41 AM, Roger Bivand <Roger.Bivand at nhh.no>
wrote:
On Mon, 28 Oct 2013, Oliver Soong wrote:
I've had a long standing struggle with GDAL.close on Windows, and I
think I might finally have found a fix. I'm currently running rgdal
0.8.11, R 3.0.2, and 32-bit Windows 7.
Currently, writeRaster and writeGDAL create temporary files in the
tempdir() folder (the final filename prefixed with 3 random [a-z]
letters). On my system, these files get left open and orphaned. When
doing heavy processing, this can lead to the drive hosting the
tempdir() folder to become full, even if the data is being ultimately
written to a much larger drive. This also means that R cannot clean
up these files or the tempdir() folder when it closes, causing similar
bloat in my %TEMP%.
I haven't tested this on other platforms, but I think it might help to
insert an extra line into GDAL.close:
.setCollectorFun(slot(dataset, "handle"), NULL)
.Call("RGDAL_CloseHandle", dataset at handle, PACKAGE = "rgdal")
.Call("RGDAL_CloseDataset", dataset, PACKAGE = "rgdal")
For whatever reason, RGDAL_CloseDataset doesn't seem to actually close
the C file handle, but it doesn't seem to mind if the file handle was
closed beforehand.
Could you please provide a working example? I have looked at this, but need a baseline to know whether I'm looking at the same thing. I'm very unsure that this is a robust solution, and need an instrumented example, including listings of the temporary directory during the process, to see the consequences. Thanks for looking into this, but I'd prefer to be sure that a Windows-specific fix doesn't make things worse for others too. Please also report on the source of your Windows rgdal binary - is it from CRAN or locally built dynamically linking your own GDAL? Best wishes, Roger Cheers,
Oliver
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo --
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
-- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no