Skip to content

subdatasets in rgdal/raster

8 messages · Michael Sumner, Jonathan Greenberg, Roger Bivand

#
Hello,

As far as I know there's nothing in rgdal to determine existing
subdatasets (SDS) in HDF (or NetCDF for that matter).

In raster, raster(x) will read from the first candidate *variable*,
and list the others with a warning, which is analogous to an SDS but
of course it's always providing this via ncdf/ncdf4 rather than via
rgdal.

Is it true there's no way to find out SDS from HDF or NetCDF in rgdal?

I've not really explored this before much since compiling HDF was
unavailable to me. gdalUtils fills a gap but only by running the
installed command line tools, which obviously can be avoided with the
right features in rgdal/raster. I personally have always run the
command line tools and then jumped back to R (if I could).

(The vector analogy in rgdal to listing SDS would be ogrListLayers).

This is a missing capability in the raster package, somewhat disguised
because of the independent support for NetCDF. I guess hardly anyone
is using HDF? That's a shame since these R tools really help.  It's a
pretty complicated story all round and I keep re/discovering
interesting corners. If anyone is writing anything significant about
all this I'd be keen to be involved.

Cheers, Mike.
#
On Wed, 15 Oct 2014, Michael Sumner wrote:

            
Because these formats are poorly organised (an XML tree describing the 
internal structure would be easy to parse, but self-descriptive metadata 
isn't usual here), GDALinfo() does not provide much.

If you start by identifying the incantation used by command line gdalinfo 
to report the SDS metadata, then look at its source code, it should be 
possible to make some progress. I don't use either if I can help it, and 
do not have the necessary trial set of sample files. You'd need to find a 
robust route covering a wide range of files seen in the wild to allow for 
oddities in the ways things are organised. It is unfortunate to have to 
branch code on the selected driver, but here it is hard to avoid. Once 
GDALinfo is reporting the metadata in a predictable way, the other 
functions can be given appropriate argument values.

Also consider looking at this from the point of view of rgdal2 in addition 
to rgdal; it could possibly be "closer" there and looking at both may 
make it easier to spot:

https://github.com/thk686/rgdal2

Contributions welcome!

Roger

  
    
#
Thanks Roger, I'm still hoping to be able to code at this level one day.

I have gone off half-cocked, confusing things. I can get SDS from
NetCDF and HDF files with GDALinfo() with no problems - I just had to
turn off returnScaleOffset, which I think has a confounding of nbands
and subdatasets - since a datasource with SDS has no bands, but I need
to explore this more to be sure.

Cheers, Mike.
On Wed, Oct 15, 2014 at 5:50 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:

  
    
#
Mike:

Have you taken a look at gdalUtils?  This is a wrapper for all the
GDAL binaries (I am still working on adding in all those new tools
that were recently released with version 1.11.1, but it should be up
to date as of v 1.10.1).  gdalinfo that is part of gdalUtils gives you
the full info dump (more, if less clean, output than rgdal's gdalInfo
will give you), which you could parse for the info you wanted.  I also
have a custom function "get_subdatasets" which sounds a lot like what
you are describing.

Give it a shot!  You will need to install GDAL yourself first -- check
the help for suggestions on which flavor of GDAL I recommend.

--j
On Wed, Oct 15, 2014 at 6:23 AM, Michael Sumner <mdsumner at gmail.com> wrote:

  
    
5 days later
#
I have, and it's good but it's not what I am after here.

I really want access to HDF4 via tight-coupling. It does work well in
R when you build GDAL right, especially with raster over the top to
clean everything up. If you can compile those drivers into GDAL it
does beg the question why have a wrapper like gdalUtils. (My take on
this is that Windows-support stops at the CRAN binary, and again this
is why raster has its own NetCDF wrapper).

Ultimately the logic for these rogue files belongs in GDAL and
rgdal/raster should just reflect that: rgdal2 is essentially that, but
doesn't offer much more and it's not clear how to "rasterize"
everything again.

:)
On Thu, Oct 16, 2014 at 2:27 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:

  
    
1 day later
#
I think Roger has mentioned a few times over the years his reticence
to build in the HDF4 drivers into the Windows gdal binary (Roger,
perhaps you can discuss why this is a problem)-- I actually built
gdalUtils for this exact purpose -- the need to access MODIS and
Landsat HDFs in Windows.

One thing that may be worth considering is if Roger is open to it is
having someone work to get the Windows binaries that come along with
rgdal so they support a wider range of formats (specifically getting
HDF4 and 5 and NetCDF functional)?  Others have already solved this
satisfactorily (osgeo4w has all of the key formats supported, for
instance), so it should be doable.

Also, to respond to your question about why have gdalUtils at all -- I
don't think every function that the base gdal utilities is available
in rgdal (correct me if I'm wrong), so I wanted a way to bring them to
R users in a (relatively) easy to use fashion -- I've seen gdalUtills
as a complementary package to rgdal.

--j
On Tue, Oct 21, 2014 at 2:17 AM, Michael Sumner <mdsumner at gmail.com> wrote:

  
    
1 day later
#
On Wed, 22 Oct 2014, Jonathan Greenberg wrote:

            
The CRAN Windows binaries are built by statically linking to both 32-bit 
and 64-bit builds of GDAL, PROJ.4, and a small selection of (partly 
platform dependent) other libraries. These GDAL etc. binaries are prepared 
by Brian Ripley and Uwe Ligges, as they have write access to the 
win-builder and CRAN servers (the same relationship exists for rgeos). 
Some years ago, I prepared 32-bit DLLs, which were then used in a CRAN 
binary that shipped with those libraries. The CRAN administrators found 
that this was much less satisfactory than a static build, to avoid 
possible interference between multiple installs of DLLs on this platform 
among other reasons. HDF and NetCDF are seen as substantially more complex 
external dependencies, so have not been seriously considered for inclusion 
in the GDAL binaries used in preparing rgdal for Windows.

This isn't impossible, but volunteering other peoples' time and insight 
seems unfortunate. If interested people would like to work up a route to 
including these extra libraries, in the knowledge that both have had 
portability issues (see clang), then in time it might be possible to make 
progress.

Another avenue is to support the OSGeo4W project in its attempts to 
provide R with rgdal and rgeos, and thus get the drivers that way.

Neither will happen without a longer term commitment.

Roger

  
    
#
Thank Roger, I'm keen and I have support for the long term, I just
need to learn how the build actually gets done so I can do it myself.
I'm sorry this is such an ongoing topic but I do have the resources, I
just lack some key skills.

Is the win-builder setup documented? If it is I cannot find it. Prof.
Ripley publishes the sources that are used but not the details of the
build afaik, I have an early set of notes using cross-compilation
which I have used to build GDAL for Windows on Linux with
HDF4/5/NetCDF etc, I just do not know how that gets bundled together
into the static Windows rgdal. (I may be missing something really
simple and obvious).

I'm also concerned about the NetCDF support generally in R, there are
file types that none of the 3 packages suppport so that really needs
fixing. I'll get to it eventually but if anyone reading this is keen
please contact me. We  would like to improve the support that is
already there and make sure it has longevity.

Cheers, Mike.
On Fri, Oct 24, 2014 at 6:45 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote: