Hi, Recently while teaching at SFU I hit the problem that infects R when many people work on similar projects - the multitude of data formats for similar data. The sp project was partly an attempt to give a standard format for spatial data but its widespread non-use in older packages causes trouble. So for example I taught the students all about 'sp' objects, and then they had to use spatstat and splancs for some point-process stuff, then geoR for some kriging, none of which use sp objects. So I figured maybe we need a whole load of 'as' functions that can convert between the various spatial data formats (there are more in CRAN, I am sure) to help us all out on this. Some of these functions may already exist, indeed I just found something about converting fairly raw x-y coordinate objects to SpatialPolygons hidden away in the SpatialEpi package (polygons2spatial.polygons). Would it be a good idea to stick all the conversions we can think of into a single package, "spBabel" say (or spConversion to avoid any cultural reference), so people have a one-stop shop? And if we find routines stuck in other packages (such as polygons2spatial.polygons) we rip them out and bundle them? Yes, its a matter of time and effort and we're all busy, but I'd like to put it out as a proposal. It might make a nice intern or GSOC project, but we're a bit late for that, so maybe if anyone has a PhD student starting who needs to get up to speed with R packages and spatial data it would be a good introduction for them. Once its all set up (on R-forge or similar) contributing shouldn't be a problem. Okay, that's my one crazy idea for the day done. Barry
Spatial data tower of babel
8 messages · Barry Rowlingson, Roger Bivand, Edzer Pebesma +2 more
On Wed, 18 Aug 2010, Barry Rowlingson wrote:
Hi, Recently while teaching at SFU I hit the problem that infects R when many people work on similar projects - the multitude of data formats for similar data. The sp project was partly an attempt to give a standard format for spatial data but its widespread non-use in older packages causes trouble. So for example I taught the students all about 'sp' objects, and then they had to use spatstat and splancs for some point-process stuff, then geoR for some kriging, none of which use sp objects. So I figured maybe we need a whole load of 'as' functions that can convert between the various spatial data formats (there are more in CRAN, I am sure) to help us all out on this. Some of these functions may already exist, indeed I just found something about converting fairly raw x-y coordinate objects to SpatialPolygons hidden away in the SpatialEpi package (polygons2spatial.polygons).
Barry, I think that spatstat is well provided for, mostly in maptools, but also in the spatstat vignette on using shapefiles. Of course, the available functionalities sp <-> spatstat classes could probably be documented more fully, and the coercion functions updated, but I think that they do most of what is needed. I agree that there may be others out there, and like you come across them from time to time. Sometimes the CRAN reverse dependencies show who they might be. Since splancs is largely pre-S3 (right?) and doesn't use classes, coercion isn't an option, so documentation and a wrapper function or two might be sensible. I started on this, but only got as far as the spkernel2d() that uses a call to GridTopology() to set up the output grid. geoR does use S3 classes, so might be closer, and does depend on sp. There is a method for coercing a SpatialPointsDataFrame to "geodata". The borders component of a "geodata" object is harder to introduce. Taking the coordinates() of a SpatialPixels object to pass to locations= is OK, as are the subsetting of data frame columns for the trend.d= and trend.l= arguments. I guess Paulo would need to move to a formula= data= interface to likfit(), krige.bayes() and krige.conv(), at least, to permit sp objects to be used "closer" to the actual core. Probably a good deal could be done by documentation, and by communicating better about what already is there. Useful topic! Best wishes, Roger
Would it be a good idea to stick all the conversions we can think of into a single package, "spBabel" say (or spConversion to avoid any cultural reference), so people have a one-stop shop? And if we find routines stuck in other packages (such as polygons2spatial.polygons) we rip them out and bundle them? Yes, its a matter of time and effort and we're all busy, but I'd like to put it out as a proposal. It might make a nice intern or GSOC project, but we're a bit late for that, so maybe if anyone has a PhD student starting who needs to get up to speed with R packages and spatial data it would be a good introduction for them. Once its all set up (on R-forge or similar) contributing shouldn't be a problem. Okay, that's my one crazy idea for the day done. Barry
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
6 days later
Barry, what exactly did you try out before you posted? Your claim is not completely true: geoR has a function as.geodata.SpatialPointsDataFrame, so you can do, for instance: library(geoR) data(meuse) # from sp coordinates(meuse) = ~x+y krige.bayes(as.geodata(meuse, "zinc")) and its locations argument can be a SpatialPoints object. Best regards,
On 08/18/2010 09:05 PM, Roger Bivand wrote:
On Wed, 18 Aug 2010, Barry Rowlingson wrote:
Hi, Recently while teaching at SFU I hit the problem that infects R when many people work on similar projects - the multitude of data formats for similar data. The sp project was partly an attempt to give a standard format for spatial data but its widespread non-use in older packages causes trouble. So for example I taught the students all about 'sp' objects, and then they had to use spatstat and splancs for some point-process stuff, then geoR for some kriging, none of which use sp objects. So I figured maybe we need a whole load of 'as' functions that can convert between the various spatial data formats (there are more in CRAN, I am sure) to help us all out on this. Some of these functions may already exist, indeed I just found something about converting fairly raw x-y coordinate objects to SpatialPolygons hidden away in the SpatialEpi package (polygons2spatial.polygons).
Barry, I think that spatstat is well provided for, mostly in maptools, but also in the spatstat vignette on using shapefiles. Of course, the available functionalities sp <-> spatstat classes could probably be documented more fully, and the coercion functions updated, but I think that they do most of what is needed. I agree that there may be others out there, and like you come across them from time to time. Sometimes the CRAN reverse dependencies show who they might be. Since splancs is largely pre-S3 (right?) and doesn't use classes, coercion isn't an option, so documentation and a wrapper function or two might be sensible. I started on this, but only got as far as the spkernel2d() that uses a call to GridTopology() to set up the output grid. geoR does use S3 classes, so might be closer, and does depend on sp. There is a method for coercing a SpatialPointsDataFrame to "geodata". The borders component of a "geodata" object is harder to introduce. Taking the coordinates() of a SpatialPixels object to pass to locations= is OK, as are the subsetting of data frame columns for the trend.d= and trend.l= arguments. I guess Paulo would need to move to a formula= data= interface to likfit(), krige.bayes() and krige.conv(), at least, to permit sp objects to be used "closer" to the actual core. Probably a good deal could be done by documentation, and by communicating better about what already is there. Useful topic! Best wishes, Roger
Would it be a good idea to stick all the conversions we can think of into a single package, "spBabel" say (or spConversion to avoid any cultural reference), so people have a one-stop shop? And if we find routines stuck in other packages (such as polygons2spatial.polygons) we rip them out and bundle them? Yes, its a matter of time and effort and we're all busy, but I'd like to put it out as a proposal. It might make a nice intern or GSOC project, but we're a bit late for that, so maybe if anyone has a PhD student starting who needs to get up to speed with R packages and spatial data it would be a good introduction for them. Once its all set up (on R-forge or similar) contributing shouldn't be a problem. Okay, that's my one crazy idea for the day done. Barry
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster Weseler Stra?e 253, 48151 M?nster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de http://www.52north.org/geostatistics e.pebesma at wwu.de
On Wed, Aug 25, 2010 at 1:53 PM, Edzer Pebesma
<edzer.pebesma at uni-muenster.de> wrote:
Barry, what exactly did you try out before you posted? Your claim is not completely true: geoR has a function as.geodata.SpatialPointsDataFrame, so you can do, for instance: library(geoR) data(meuse) # from sp coordinates(meuse) = ~x+y krige.bayes(as.geodata(meuse, "zinc")) and its locations argument can be a SpatialPoints object.
Well, I didnt claim these functions didnt exist, nor did I point out
that some are trivial - ie to get from a SpatialPointsDataFrame to a
set of locations for, say, splancs' K-function, you just do
coordinates(foo). What I was hoping for was that we could create a
single point where these conversions could be collected, which would
be an almost authoritative source of conversions.
geoR has SpatialPointsDataFrame to geodata - but does it have the
other way round too? Or is that in sp? It doesn't matter too much,
since students will find them either way, but does
as.sp(as.geodata('meuse","zinc")) get you back where you started?
That's what students may expect. Conversion is a big headache for new
users and anything that makes it easier is a plus. Imagine doing
vignette(spBabel) and getting a whole list of what formats can be
converted together with caveats and restrictions - sounds good to me.
Obviously the problems are in maintainance and keeping conversions up
to date with any changes in the format in the main package, as well as
that this package would probably depend on all the other packages...
Idle coffee-time thoughts...
Barry
I think that such a package would be very useful. It could have a single function like convert(x, 'AnotherClass') The package would only need to depend on sp, all the other packages would be "suggested" such that you do not need to install the packages you do not use. Robert On Wed, Aug 25, 2010 at 12:00 PM, Barry Rowlingson
<b.rowlingson at lancaster.ac.uk> wrote:
On Wed, Aug 25, 2010 at 1:53 PM, Edzer Pebesma <edzer.pebesma at uni-muenster.de> wrote:
Barry, what exactly did you try out before you posted? Your claim is not completely true: geoR has a function as.geodata.SpatialPointsDataFrame, so you can do, for instance: library(geoR) data(meuse) # from sp coordinates(meuse) = ~x+y krige.bayes(as.geodata(meuse, "zinc")) and its locations argument can be a SpatialPoints object.
Well, I didnt claim these functions didnt exist, nor did I point out
that some are trivial - ie to get from a SpatialPointsDataFrame to a
set of locations for, say, splancs' K-function, you just do
coordinates(foo). What I was hoping for was that we could create a
single point where these conversions could be collected, which would
be an almost authoritative source of conversions.
?geoR has SpatialPointsDataFrame to geodata - but does it have the
other way round too? Or is that in sp? It doesn't matter too much,
since students will find them either way, but does
as.sp(as.geodata('meuse","zinc")) get you back where you started?
That's what students may expect. Conversion is a big headache for new
users and anything that makes it easier is a plus. Imagine doing
vignette(spBabel) and getting a whole list of what formats can be
converted together with caveats and restrictions - sounds good to me.
?Obviously the problems are in maintainance and keeping conversions up
to date with any changes in the format in the main package, as well as
that this package would probably depend on all the other packages...
?Idle coffee-time thoughts...
Barry
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
... and this convert function would then loop over all possible classes of x, and for each possibility over all values for "AnotherClass"? Sounds like the n-to-n solution we tried to avoid when we started sp. Coercion is formally done in S4 by using as(), as in as(x, "AnotherClass") and this coercion is automatic when AnotherClass is a superclass for x, and can otherwise be specified by setAs. Informally in S3, it's typically done by functions like as.AnotherClass.ThisClass, which is called when, in as.AnotherClass(x) x is of class "ThisClass". Problems I see with having a package that provides all these functions is authorship: will each class author of package X update this package each time she/he changes a class definition (S4) or the assumptions implicitly made about it (S3)? Also, for S4 classes I believe "suggest:" only will not do. I would rather ask package authors to call for explicit coercion, e.g. the first line in krige.bayes (geoR) should be if (class(data) != "geodata") data = as.geodata(data) so that anyone passing it data of a new class will only have to provide an as.geodata.MyNewClass function to make this work (provided that package is loaded, which seems reasonable - some function will need to create the MyNewClass objects). Not dissimilar to Barry's 10 years old idea that coordinates(x) should return the spatial coordinates of object x, whatever x is. Why does library(sp) data(meuse) coordinates(meuse) = ~x+y plot(log(zinc) ~ sqrt(dist), meuse) work? sp doesn't provide the plot method used here, and this method doesn't know nor imports the Spatial* classes. Somewhere meuse gets transformed to a data.frame, for which sp indeed provides methods.
On 08/25/2010 09:10 PM, Robert J. Hijmans wrote:
I think that such a package would be very useful. It could have a single function like convert(x, 'AnotherClass') The package would only need to depend on sp, all the other packages would be "suggested" such that you do not need to install the packages you do not use. Robert On Wed, Aug 25, 2010 at 12:00 PM, Barry Rowlingson <b.rowlingson at lancaster.ac.uk> wrote:
On Wed, Aug 25, 2010 at 1:53 PM, Edzer Pebesma <edzer.pebesma at uni-muenster.de> wrote:
Barry, what exactly did you try out before you posted? Your claim is not completely true: geoR has a function as.geodata.SpatialPointsDataFrame, so you can do, for instance: library(geoR) data(meuse) # from sp coordinates(meuse) = ~x+y krige.bayes(as.geodata(meuse, "zinc")) and its locations argument can be a SpatialPoints object.
Well, I didnt claim these functions didnt exist, nor did I point out
that some are trivial - ie to get from a SpatialPointsDataFrame to a
set of locations for, say, splancs' K-function, you just do
coordinates(foo). What I was hoping for was that we could create a
single point where these conversions could be collected, which would
be an almost authoritative source of conversions.
geoR has SpatialPointsDataFrame to geodata - but does it have the
other way round too? Or is that in sp? It doesn't matter too much,
since students will find them either way, but does
as.sp(as.geodata('meuse","zinc")) get you back where you started?
That's what students may expect. Conversion is a big headache for new
users and anything that makes it easier is a plus. Imagine doing
vignette(spBabel) and getting a whole list of what formats can be
converted together with caveats and restrictions - sounds good to me.
Obviously the problems are in maintainance and keeping conversions up
to date with any changes in the format in the main package, as well as
that this package would probably depend on all the other packages...
Idle coffee-time thoughts...
Barry
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster Weseler Stra?e 253, 48151 M?nster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de http://www.52north.org/geostatistics e.pebesma at wwu.de
Rather something like this in the simplest form; i.e. using an S4
method for inheritance, and passing it on to other packages as much as
possible.
setMethod('convert', signature(x='ANY', class='character'),
function(x, class, ...) {
y <- try( as(x, class), silent=TRUE )
if (class(y) == 'try-error') { stop('sorry') } else { return(y) }
} )
And adding more methods for classes that do not have as methods. Most
objects could be coerced into an sp object, and then into whatever is
requested. Perhaps there is a lot of ugly nitty-gritty there. Perhaps
you are right about dependencies and S4.
Still, I think this could be a step forward from the current situation
where many no standard coercion functions exist that might be hard to
find or remember.
Robert
On Wed, Aug 25, 2010 at 1:03 PM, Edzer Pebesma
<edzer.pebesma at uni-muenster.de> wrote:
... and this convert function would then loop over all possible classes of x, and for each possibility over all values for "AnotherClass"? Sounds like the n-to-n solution we tried to avoid when we started sp. Coercion is formally done in S4 by using as(), as in as(x, "AnotherClass") and this coercion is automatic when AnotherClass is a superclass for x, and can otherwise be specified by setAs. Informally in S3, it's typically done by functions like as.AnotherClass.ThisClass, which is called when, in as.AnotherClass(x) x is of class "ThisClass". Problems I see with having a package that provides all these functions is authorship: will each class author of package X update this package each time she/he changes a class definition (S4) or the assumptions implicitly made about it (S3)? Also, for S4 classes I believe "suggest:" only will not do. I would rather ask package authors to call for explicit coercion, e.g. the first line in krige.bayes (geoR) should be if (class(data) != "geodata") ?data = as.geodata(data) so that anyone passing it data of a new class will only have to provide an as.geodata.MyNewClass function to make this work (provided that package is loaded, which seems reasonable - some function will need to create the MyNewClass objects). Not dissimilar to Barry's 10 years old idea that coordinates(x) should return the spatial coordinates of object x, whatever x is. Why does library(sp) data(meuse) coordinates(meuse) = ~x+y plot(log(zinc) ~ sqrt(dist), meuse) work? sp doesn't provide the plot method used here, and this method doesn't know nor imports the Spatial* classes. Somewhere meuse gets transformed to a data.frame, for which sp indeed provides methods. On 08/25/2010 09:10 PM, Robert J. Hijmans wrote:
I think that such a package would be very useful. It could have a single function like convert(x, 'AnotherClass') The package would only need to depend on sp, all the other packages would be "suggested" such that you do not need to install the packages you do not use. Robert On Wed, Aug 25, 2010 at 12:00 PM, Barry Rowlingson <b.rowlingson at lancaster.ac.uk> wrote:
On Wed, Aug 25, 2010 at 1:53 PM, Edzer Pebesma <edzer.pebesma at uni-muenster.de> wrote:
Barry, what exactly did you try out before you posted? Your claim is not completely true: geoR has a function as.geodata.SpatialPointsDataFrame, so you can do, for instance: library(geoR) data(meuse) # from sp coordinates(meuse) = ~x+y krige.bayes(as.geodata(meuse, "zinc")) and its locations argument can be a SpatialPoints object.
Well, I didnt claim these functions didnt exist, nor did I point out
that some are trivial - ie to get from a SpatialPointsDataFrame to a
set of locations for, say, splancs' K-function, you just do
coordinates(foo). What I was hoping for was that we could create a
single point where these conversions could be collected, which would
be an almost authoritative source of conversions.
?geoR has SpatialPointsDataFrame to geodata - but does it have the
other way round too? Or is that in sp? It doesn't matter too much,
since students will find them either way, but does
as.sp(as.geodata('meuse","zinc")) get you back where you started?
That's what students may expect. Conversion is a big headache for new
users and anything that makes it easier is a plus. Imagine doing
vignette(spBabel) and getting a whole list of what formats can be
converted together with caveats and restrictions - sounds good to me.
?Obviously the problems are in maintainance and keeping conversions up
to date with any changes in the format in the main package, as well as
that this package would probably depend on all the other packages...
?Idle coffee-time thoughts...
Barry
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster Weseler Stra?e 253, 48151 M?nster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 ?http://ifgi.uni-muenster.de http://www.52north.org/geostatistics ? ? ?e.pebesma at wwu.de
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
4 days later
Hi,
geoR has SpatialPointsDataFrame to geodata - but does it have the
other way round too? Or is that in sp? It doesn't matter too much,
since students will find them either way, but does
as.sp(as.geodata('meuse","zinc")) get you back where you started?
But the problem is that this is not always possible. PPP objects (in spatstat) store not only the coordinates (and marks) but also the boundary. So, they are a mix of SpatialPoints and SpatialPolygons...
That's what students may expect. Conversion is a big headache for new users and anything that makes it easier is a plus. Imagine doing vignette(spBabel) and getting a whole list of what formats can be converted together with caveats and restrictions - sounds good to me.
Yes, that would be handy.
Obviously the problems are in maintainance and keeping conversions up to date with any changes in the format in the main package, as well as that this package would probably depend on all the other packages...
I am a bit with Edzer regarding a spBabel package. I would prefer to have all these sp<->other_format in the package that provides the new classes. The main reason is that the developer of the package will be responsible for them, so that if any change is made to the S4 classes the conversion functions will be updated and it will not break compatibility with other packages/spBabel. Well, just my two cents... Virgilio