Oliver
On Tue, Nov 5, 2013 at 11:50 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Tue, 5 Nov 2013, Oliver Soong wrote:
Hi Roger,
I forgot that RGDAL_CloseDataset nils dataset at handle, which means the
subsequent RGDAL_CloseHandle can't find the handle to close. It also
means
that GDAL.close should replace RGDAL_CloseDataset with RGDAL_CloseHandle.
On linux, this will cause temporary datasets to be removed by the isTrans
unlink code instead of GDALDeleteDataset, but it should make things work
on
Windows. I don't know if there are drivers that might create files that
don't match the unlink pattern.
I do think it would be better to ultimately fix RGDAL_DeleteHandle. Under
normal circumstances, I would call RGDAL_CloseHandle (or equivalently call
GDALClose and R_ClearExternalPtr directly) and then GDALDeleteDataset.
I'm
still uncertain why the ordering is reversed or what the (commented out)
#ifndef OSGEO4W deleteFile business is about.
Thanks to Even Rouault on gdal-dev, I think that:
http://win-builder.r-project.org/jJIzeDT1C100
contains a fixed version, with RGDAL_DeleteHandle now closing the dataset
before deleting it. This worked on Linux and Windows in a rough cut, and
this one after removing the unneeded unlink() call in GDAL.close() now
works on Linux, and should also work on Windows. Behaviour on Linux and
Windows with respect to transient datasets being deleted is now the same.
It was helpful that you stayed with this, it was far from simple, as Even
explained on the gdal-dev thread.
Roger
Cheers,
Oliver
On Tue, Nov 5, 2013 at 5:56 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Tue, 5 Nov 2013, Oliver Soong wrote:
Toggling on GDALTransientDataset in GDAL.close doesn't change anything
because RGDAL_CloseDataset already toggles on that. In either case,
RGDAL_DeleteHandle still doesn't work. Also, the gc at the end doesn't
do
anything useful because .setCollectorFun has already trivialized the
finalizer and because dataset still exists at that point, as does the
source dataset given to GDAL.close in the first place.
The HFA driver has the exact same issue. From what I see in the code, I
don't think it's related to the specific driver.
Here is the next version:
http://win-builder.r-project.org/0X0318s8iW0C
with:
.setCollectorFun(slot(dataset, 'handle'), NULL)
.Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
.Call("RGDAL_CloseHandle", slot(dataset, 'handle'),
PACKAGE="rgdal")
Running under Windows, and using:
http://download.sysinternals.com/files/Handle.zip
to check, the three transient datasets are open:
3B8: File (RW-) C:\Users\rsb\AppData\Local\
Temp\Rtmp8kZk0N\awlr2.tif
3C0: File (RW-) C:\Users\rsb\AppData\Local\
Temp\Rtmp8kZk0N\tfur1.tif
498: File (RW-) C:\Users\rsb\AppData\Local\
Temp\Rtmp8kZk0N\mxbr3.tif
I'm assuming that RW- means read and write open, the third character is
D,
which is probably directory.
Roger
Cheers,
Oliver
On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com>
wrote:
Er, just saw your recent e-mail. I'll take a look.
Oliver
On Mon, Nov 4, 2013 at 3:05 PM, Oliver Soong <osoong+r at gmail.com>
wrote:
I do think we've gotten a bit muddled. I'm probably not helping, but
I'll do my best.
Windows XP (32-bit), 7 (32-bit), and 2008 R2 (32-bit and 64-bit),
R-3.0.2, sp 1.0.13, rgdal 0.8.11, raster 2.1.49.
The main problem as seen by end-users is the orphaned temporary files
that you observed were left over for r1 and r2. They cannot be
removed
while R is running. I seem not to have explained r3 very well, but
suffice
it to say that GDAL.close(r3) creates similar orphaned temporary
files,
indicating GDAL.close is not functioning properly on
GDALTransientDataset
objects under Windows.
I think RGDAL_DeleteHandle (and hence RGDAL_CloseDataset) is the root
of
the problem, and I think it's not properly closing the file handle
before
trying and failing to delete the associated files. Windows
automatically
locks open file handles, but linux requires extra steps that are not
always
done and are not always respected, which is probably why this isn't
apparent under linux. The finalizer code and resetting seems
appropriate.
I still hesitate to say much about RGDAL_DeleteHandle, but I will
point
out that one is normally supposed to close the file handle before
deleting
the file, and it seems backwards in RGDAL_DeleteHandle. I don't know
if
that is intentional.
After thinking a little more, I think it's better to switch the calls
to
RGDAL_CloseHandle and RGDAL_CloseDataset that I suggested originally
for
GDAL.close. That means simply adding the call to RGDAL_CloseHandle
after
the call to RGDAL_CloseDataset, rather than before. With this code,
if
RGDAL_CloseDataset behaves as intended, RGDAL_CloseHandle will get a
nil
pointer and won't do anything. However, if RGDAL_CloseDataset fails
to
function properly, RGDAL_CloseHandle will close the open handle and
the
if(isTrans) cleanup code already in GDAL.close will operate.
Perhaps I could be more helpful if you explained what you thought my
suggested change might break? This last one (the existing
RGDAL_CloseDataset followed by an additional RGDAL_CloseHandle) should
be
no worse than the current code.
Is that at all clearer?
Oliver
On Mon, Nov 4, 2013 at 1:03 PM, Roger Bivand <Roger.Bivand at nhh.no>
wrote:
Which Windows? XP, Vista, 7, 8, 8.1? 32 and 64 bit? If Vista/7/8, run
as
administrator or not? I agree that the code in those parts of rgdal
is
not
well-designed - it was well-designed, but has been modified so that
it
works for most people cross-platform, and has had to accommodate
changes
that have taken place in GDAL over more than 10 years, not least the
error-handler.
The simple solution to your practical problem is to for you to use a
larger temporary drive under Windows, or change to an operating
system
that
does not have these side-effects.
Assisting you is not just a matter of doing what you think works for
you, but making sure it doesn't break anything else for anybody else
cross-platform.
Your script does not check for other files in tempdir, so I
prepended a
listing of prior content:
pc <- dir(tempdir())
and dropped them from the list for unlinking:
now <- dir(tempdir())
unlink(paste(tempdir(), now[!(now %in% pc)], sep=.Platform$file.sep))
I do not see how your script exercises the problem. It creates a new
transient file, but does not close it, which was the behaviour you
are
unhappy with. If I add
GDAL.close(r3)
on Linux, the transient dataset is removed. On Windows 7 64-bit with
the
CRAN rgdal binary run as user, temporary files are left in tempdir
for
r1,
r2, and r3. The same three temporary files are left when run as
administrator.
The earliest version of GDAL.close was:
GDAL.close <- function(dataset) {
.setCollectorFun(slot(dataset, 'handle'), NULL)
.Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
invisible()
}
with a version in 2007 in the THK branch calling a closeDataset
method,
containing:
handle <- slot(dataset, "handle")
unreg.finalizer(handle)
.Call("RGDAL_DeleteHandle", handle, PACKAGE="rgdal")
with:
unreg.finalizer <- function(obj) reg.finalizer(obj, function(x) x)
and by 2010 was:
GDAL.close <- function(dataset) {
.setCollectorFun(slot(dataset, 'handle'), NULL)
.Call('RGDAL_CloseDataset', dataset, PACKAGE="rgdal")
invisible(gc())
}
Special handling of GDALTransientDataset was added in revision 433 in
Janual 2013, and modified in revision 462 in April 2013.
It has seemed IIRC that Windows can treat arbitrary files as open. It
is
also possible that there is an interaction between Windows and
rgdal:::.setCollectorFun(), which does what it should, when given the
NULL
argument, setting:
.setCollectorFun <- function(object, fun) {
if (is.null(fun)) fun <- function(obj) obj
reg.finalizer(object, fun, onexit=TRUE)
}
so incorporating the THK branch logic. It could possibly also vary
across drivers, so finding a robust fix means setting up a test rig
with
multiple Windows machines and testing for multiple drivers to see why
some
temprary files are being treated as open when other operating systems
don't
have problems in their removal. Windows users with too small
temporary
directories. I welcome contributions from people who understand
Windows and
can actually explain why we see the consequences we see.
One candidate may be to branch to .Call("RGDAL_DeleteHandle", handle,
PACKAGE="rgdal") for the GDALTransientDataset case; I'll report back
once
the package has gone through win-builder.
Hope this doesn't muddle too much, clarification doesn't seem like
the
right expression.
Roger
On Sun, 3 Nov 2013, Oliver Soong wrote:
I've been using the CRAN rgdal and raster. I apologize in advance
for
all
the linebreaks that will be broken. This code should highlight the
problem
and the fix:
require(rgdal)
require(raster)
r1 <- raster(system.file("external/test.grd", package="raster"))
r2 <- as(r1, "SpatialGridDataFrame")
r2.dims <- gridparameters(r2)$cells.dim
r3 <- new("GDALTransientDataset", driver = new("GDALDriver",
"GTiff"),
rows
= r2.dims[2], cols = r2.dims[1], bands = 1, type = "Float32",
options
=
NULL, fname = file.path(tempdir(), "r3.tif"), handle = NULL)
print(dir(tempdir()))
writeRaster(r1, file.path(tempdir(), "r1.tif"))
writeGDAL(r2, file.path(tempdir(), "r2.tif"))
print(dir(tempdir()))
unlink(dir(tempdir(), full.names = TRUE))
print(dir(tempdir()))
leftover <- gsub("/", "\\\\", dir(tempdir(), full.names = TRUE))
invisible(lapply(paste("cmd /c del", leftover), system))
rm(r1, r2, r3)
gc()
unlink(dir(tempdir(), full.names = TRUE))
print(dir(tempdir()))
invisible(lapply(paste("cmd /c del", leftover), system))
Basically, I'm trying to write a standard raster package raster (r1)
and an
sp package SpatialGridDataFrame (r2). Both of those end up calling
new("GDALTransientDataset"), hence r3. At the first
print(dir(tempdir())),
only r3 has an open temporary file, which is expected. At the
second,
all
three have open temporary files, and r1 and r2 have their written
final
outputs, which are closed. The temporary files for r1 and r2 should
have
been closed at this point. None of the temporary files can be
removed
by
unlink, although the final outputs can, as shown at the third
print(dir(tempdir())). Windows can't remove them, either. However,
if
I
remove the GDALTransientDataset r3 and initiate gc(), R can remove
that
temporary file, but this does not work for r1 and r2. After q(),
the
tempdir() will not be removed by R, but it and the temporary files
for
r1
and r2 can now be removed.
It looks like GDAL.close is broken (again/as always), but the
collector
function for GDALTransientDataset seems to at least close the
handle.
GDAL.close relies on RGDAL_CloseDataset, whereas the
GDALTransientDataset
collector just uses RGDAL_CloseHandle. With the handle closed, I
think
the
unlink code in GDAL.close will work (as an aside, I'd use the
pattern
paste0("^[a-z]{3}", basen, "$") to be safer and the argument
full.names
might be simpler than constructing flf separately). I believe
RGDAL_CloseDataset checks for NULL handles but just returns early,
so
it
should be the same to replace the .Call("RGDAL_CloseDataset", ...)
with
.Call("RGDAL_CloseHandle", ...).
Really, I think RGDAL_DeleteHandle needs to be fixed, but I don't
know
enough about GDALDeleteDataset or the #ifndef OSGEO4W deleteFile
business
or why RGDAL_CloseHandle is commented out to make any useful
suggestions
there.
Cheers,
Oliver
On Fri, Nov 1, 2013 at 1:41 AM, Roger Bivand <Roger.Bivand at nhh.no>
wrote:
On Mon, 28 Oct 2013, Oliver Soong wrote:
I've had a long standing struggle with GDAL.close on Windows, and
I
think I might finally have found a fix. I'm currently running
rgdal
0.8.11, R 3.0.2, and 32-bit Windows 7.
Currently, writeRaster and writeGDAL create temporary files in the
tempdir() folder (the final filename prefixed with 3 random [a-z]
letters). On my system, these files get left open and orphaned.
When
doing heavy processing, this can lead to the drive hosting the
tempdir() folder to become full, even if the data is being
ultimately
written to a much larger drive. This also means that R cannot
clean
up these files or the tempdir() folder when it closes, causing
similar
bloat in my %TEMP%.
I haven't tested this on other platforms, but I think it might
help
to
insert an extra line into GDAL.close:
.setCollectorFun(slot(dataset, "handle"), NULL)
.Call("RGDAL_CloseHandle", dataset at handle, PACKAGE = "rgdal")
.Call("RGDAL_CloseDataset", dataset, PACKAGE = "rgdal")
For whatever reason, RGDAL_CloseDataset doesn't seem to actually
close
the C file handle, but it doesn't seem to mind if the file handle
was
closed beforehand.
Could you please provide a working example? I have looked at
this,
but
need a baseline to know whether I'm looking at the same thing. I'm
very
unsure that this is a robust solution, and need an instrumented
example,
including listings of the temporary directory during the process,
to
see
the consequences. Thanks for looking into this, but I'd prefer to
be
sure
that a Windows-specific fix doesn't make things worse for others
too.
Please also report on the source of your Windows rgdal binary - is
it
from
CRAN or locally built dynamically linking your own GDAL?
Best wishes,
Roger
Cheers,
Oliver