Skip to content

Spatial data tower of babel

8 messages · Barry Rowlingson, Roger Bivand, Edzer Pebesma +2 more

#
Hi,

 Recently while teaching at SFU I hit the problem that infects R when
many people work on similar projects - the multitude of data formats
for similar data. The sp project was partly an attempt to give a
standard format for spatial data but its widespread non-use in older
packages causes trouble.

 So for example I taught the students all about 'sp' objects, and then
they had to use spatstat and splancs for some point-process stuff,
then geoR for some kriging, none of which use sp objects.

 So I figured maybe we need a whole load of 'as' functions that can
convert between the various spatial data formats (there are more in
CRAN, I am sure) to help us all out on this. Some of these functions
may already exist, indeed I just found something about converting
fairly raw x-y coordinate objects to SpatialPolygons hidden away in
the SpatialEpi package (polygons2spatial.polygons).

 Would it be a good idea to stick all the conversions we can think of
into a single package, "spBabel" say (or spConversion to avoid any
cultural reference), so people have a one-stop shop? And if we find
routines stuck in other packages (such as polygons2spatial.polygons)
we rip them out and bundle them?

 Yes, its a matter of time and effort and we're all busy, but I'd like
to put it out as a proposal. It might make a nice intern or GSOC
project, but we're a bit late for that, so maybe if anyone has a PhD
student starting who needs to get up to speed with R packages and
spatial data it would be a good introduction for them. Once its all
set up (on R-forge or similar) contributing shouldn't be a problem.

 Okay, that's my one crazy idea for the day done.

Barry
#
On Wed, 18 Aug 2010, Barry Rowlingson wrote:

            
Barry,

I think that spatstat is well provided for, mostly in maptools, but also 
in the spatstat vignette on using shapefiles. Of course, the available 
functionalities sp <-> spatstat classes could probably be documented more 
fully, and the coercion functions updated, but I think that they do most 
of what is needed.

I agree that there may be others out there, and like you come across them 
from time to time. Sometimes the CRAN reverse dependencies show who they 
might be. Since splancs is largely pre-S3 (right?) and doesn't use 
classes, coercion isn't an option, so documentation and a wrapper function 
or two might be sensible. I started on this, but only got as far as the 
spkernel2d() that uses a call to GridTopology() to set up the output grid.

geoR does use S3 classes, so might be closer, and does depend on sp. There 
is a method for coercing a SpatialPointsDataFrame to "geodata". The 
borders component of a "geodata" object is harder to introduce. Taking the 
coordinates() of a SpatialPixels object to pass to locations= is OK, as 
are the subsetting of data frame columns for the trend.d= and trend.l= 
arguments. I guess Paulo would need to move to a formula= data= interface 
to likfit(), krige.bayes() and krige.conv(), at least, to permit sp 
objects to be used "closer" to the actual core.

Probably a good deal could be done by documentation, and by communicating 
better about what already is there.

Useful topic!

Best wishes,

Roger

  
    
6 days later
#
Barry, what exactly did you try out before you posted?

Your claim is not completely true: geoR has a function
as.geodata.SpatialPointsDataFrame, so you can do, for instance:

library(geoR)
data(meuse) # from sp
coordinates(meuse) = ~x+y
krige.bayes(as.geodata(meuse, "zinc"))

and its locations argument can be a SpatialPoints object.

Best regards,
On 08/18/2010 09:05 PM, Roger Bivand wrote:

  
    
#
On Wed, Aug 25, 2010 at 1:53 PM, Edzer Pebesma
<edzer.pebesma at uni-muenster.de> wrote:
Well, I didnt claim these functions didnt exist, nor did I point out
that some are trivial - ie to get from a SpatialPointsDataFrame to a
set of locations for, say, splancs' K-function, you just do
coordinates(foo). What I was hoping for was that we could create a
single point where these conversions could be collected, which would
be an almost authoritative source of conversions.

 geoR has SpatialPointsDataFrame to geodata - but does it have the
other way round too? Or is that in sp? It doesn't matter too much,
since students will find them either way, but does
as.sp(as.geodata('meuse","zinc")) get you back where you started?
That's what students may expect. Conversion is a big headache for new
users and anything that makes it easier is a plus. Imagine doing
vignette(spBabel) and getting a whole list of what formats can be
converted together with caveats and restrictions - sounds good to me.

 Obviously the problems are in maintainance and keeping conversions up
to date with any changes in the format in the main package, as well as
that this package would probably depend on all the other packages...

  Idle coffee-time thoughts...

Barry
#
I think that such a package would be very useful. It could have a
single function like

convert(x, 'AnotherClass')

The package would only need to depend on sp, all the other packages
would be "suggested" such that you do not need to install the packages
you do not use.

Robert


On Wed, Aug 25, 2010 at 12:00 PM, Barry Rowlingson
<b.rowlingson at lancaster.ac.uk> wrote:
#
... and this convert function would then loop over all possible classes
of x, and for each possibility over all values for "AnotherClass"?
Sounds like the n-to-n solution we tried to avoid when we started sp.

Coercion is formally done in S4 by using as(), as in

as(x, "AnotherClass")

and this coercion is automatic when AnotherClass is a superclass for x,
and can otherwise be specified by setAs. Informally in S3, it's
typically done by functions like as.AnotherClass.ThisClass, which is
called when, in

as.AnotherClass(x)

x is of class "ThisClass".

Problems I see with having a package that provides all these functions
is authorship: will each class author of package X update this package
each time she/he changes a class definition (S4) or the assumptions
implicitly made about it (S3)? Also, for S4 classes I believe "suggest:"
only will not do.

I would rather ask package authors to call for explicit coercion, e.g.
the first line in krige.bayes (geoR) should be

if (class(data) != "geodata")
  data = as.geodata(data)

so that anyone passing it data of a new class will only have to provide
an as.geodata.MyNewClass function to make this work (provided that
package is loaded, which seems reasonable - some function will need to
create the MyNewClass objects).

Not dissimilar to Barry's 10 years old idea that coordinates(x) should
return the spatial coordinates of object x, whatever x is.

Why does

library(sp)
data(meuse)
coordinates(meuse) = ~x+y
plot(log(zinc) ~ sqrt(dist), meuse)

work? sp doesn't provide the plot method used here, and this method
doesn't know nor imports the Spatial* classes. Somewhere meuse gets
transformed to a data.frame, for which sp indeed provides methods.
On 08/25/2010 09:10 PM, Robert J. Hijmans wrote:

  
    
#
Rather something like this in the simplest form; i.e. using an S4
method for inheritance, and passing it on to other packages as much as
possible.

setMethod('convert', signature(x='ANY', class='character'),
function(x, class, ...) {
       y <- try( as(x, class), silent=TRUE )
       if (class(y) == 'try-error') { stop('sorry')    } else {  return(y) }
} )

And adding more methods for classes that do not have as methods. Most
objects could be coerced into an sp object, and then into whatever is
requested. Perhaps there is a lot of ugly nitty-gritty there. Perhaps
you are right about dependencies and S4.

Still, I think this could be a step forward from the current situation
where many no standard coercion functions exist that might be hard to
find or remember.

Robert


On Wed, Aug 25, 2010 at 1:03 PM, Edzer Pebesma
<edzer.pebesma at uni-muenster.de> wrote:
4 days later
#
Hi,
But the problem is that this is not always possible. PPP objects (in
spatstat) store not only the coordinates (and marks) but also the
boundary. So, they are a mix of SpatialPoints and SpatialPolygons...
Yes, that would be handy.
I am a bit with Edzer regarding a spBabel package. I would prefer to
have all these sp<->other_format in the package that provides the new
classes. The main reason is that the developer of the package will be
responsible for them, so that if any change is made to the S4 classes
the conversion functions will be updated and it will not break
compatibility with other packages/spBabel.

Well, just my two cents...

Virgilio