Skip to content

Better print method for Spatial*DataFrames?

11 messages · Barry Rowlingson, Edzer Pebesma, Etienne Bellemare Racine +2 more

#
Currently if I print a spatial polygon data frame I get the list
representation, which almost always scrolls way of the screen as giant
lists of lists of coordinates whizz past. It's nearly always useless
and luckily ESS lets me C-c C-o and zap the output. For
SpatialPointsDF you get:

        coordinates letters LETTERS
1    (1, 0.0486677)       a       A
2     (2, 0.520911)       b       B
3     (3, 0.207873)       c       C
4     (4, 0.466571)       d       D

- for spatial polys and lines would it be better to have such a
compact representation as the default print? I'd rather use the word
'geometry' and have it print as a (truncated) pseudo-WKT, something
like:

     geometry letters LETTERS
1   POINT(1  0.0486677)       a       A
2   POINT(2  0.520911)       b       B

 for points, and:

     geometry letters LETTERS
1   LINESTRING(...)       a       A

 for lines, and:

     geometry letters LETTERS
1  POLYGON(...)       a       A

 for polygons. Or MULTIPOLYGON, whichever is appropriate. I think it
should literally print dot-dot-dot, since for anything other than
points its going to be voluminous.

Today I am a random idea factory...

Barry
8 days later
#
Nice suggestion! I did this for points (committed to cvs), as option in
print, and get
geometry cadmium copper lead zinc  elev
1 POINT(333611 181072)    11.7     85  299 1022 7.909
2 POINT(333558 181025)     8.6     81  277 1141 6.983
3 POINT(333537 181165)     6.5     68  199  640 7.800
        dist   om ffreq soil lime landuse dist.m
1 0.00135803 13.6     1    1    1      Ah     50
2 0.01222430 14.0     1    1    1      Ah     30
3 0.10302900 13.0     1    1    1      Ah    150

For (multi)lines / polygons, would it be useful to print the first
coordinate followed by ..., so that some kind of identification is possible?
On 05/18/2010 05:04 PM, Barry Rowlingson wrote:

  
    
1 day later
#
I taught I could add my two cents.
I agree !
I don't know what's sWKT, but the folowing output is the kind of 
printing I would like by default. Sometimes I make the mistake of 
printing a spatial polygon data frame and it can take literally 5 
minutes to output. So if it could just be the default, I'd be happy.
I think it's a good idea, but long output are always a pain to read. So 
I suggest someting compact. Maybe there could be kind of an offset 
before the display. So if you had like

POINT(349600.8 5387597)
POINT(349597.0 5387597)
POINT(349590.4 5387595)
POINT(349569.9 5387591)
POINT(349557.1 5387586)
POINT(349548.5 5387581)
POINT(349542.9 5387575)
...
Maybe it could print the coordinates as
349000+  5387500+
POINT(600.8 97)
POINT(597.0 97)
POINT(590.4 95)
POINT(569.9 91)
POINT(557.1 86)
POINT(548.5 81)
POINT(542.9 75)
...
Maybe the coordinate to display should be the "labpt" slot ? I think for 
a matter of identification someting compact is much more useful.
Talking about compactness, as I don't know of any way to put many 
geometry types in one class spatial*dataframe, is it necessary to repeat 
POINT, or (MULTI)LINE, or POLYGON ? Would it be possible to only display 
(random thaught here) P, M, L, Y? or S for surface ? I don't know. I 
like compactness !

Also, is it possible to add the same identifier (coordinate) to View() ?

Etienne
#
On Fri, 28 May 2010, Etienne Bellemare Racine wrote:

            
No. Only for SpatialPointDataFrame objects, which is what it does already. 
Please, understand that str() is a *much* better choice in effectively all 
cases where summary() isn't used. For the Spatial* objects, set a 
max.level=2 or similar, and you can *see* what is in it. The proposed 
print() method for a big multiband raster will also run away with you. Do 
str(), not print()!!!

library(maptools)
xx <- readShapeSpatial(system.file("shapes/sids.shp",
  package="maptools")[1], IDvar="FIPSNO",
  proj4string=CRS("+proj=longlat +ellps=clrk66"))
summary(xx)
str(xx, max.level=2)

To avoid having to remember to write max.level=2, could someone contribute 
a generic str() for S4 Spatial*?

Roger

  
    
#
On Fri, May 28, 2010 at 5:18 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
I'm not sure what you're saying 'No' to here, Roger. Neither str(xx)
nor summary(xx) present the object as a data frame. Conceptually its a
data frame where one of the columns is a geometry, and seeing it print
as such is a good thing (imho). I'd like to never have to use xx at data
again!

 I'm not sure trying to truncate the coordinates for nice formatting
is a good idea though, but some indication when printing a
Spatial*DataFrame that its a dataframe with geometries seems a good
idea.

Barry
#
On Fri, 28 May 2010, Barry Rowlingson wrote:

            
Just pragmatics, since things which have rushed off the top of my screen 
really are not much help, I find.

I use as(xx, "data.frame") when needed, but most often subset both 
observations and variables by "[". I'm not sure where displaying all the 
data gets you for more than a trivial number of observations and 
variables, though? The output will still swamp the console/terminal 
buffer. I'm thinking of a multi-band raster, but even standard 
show(meuse.grid) as a data.frame only leaves rows 2605-3103 on screen for 
a standard gnome-terminal. The data editor I see doesn't have a scroll 
bar, so to scroll, one would need an external viewer, I think.

In other software systems (octave, Stata, ...), one can turn on and off a 
more/less screen-by-screen displayer (not scrolling upwards, just 
chunking), but I'm not aware of an equivalent in R/S. I'm not sure how 
head() and tail() work in R, and personally use str() by default. If I 
need to access the coordinates of a particular line or polygon, I print() 
just that list element (Line or Polygon object).

I can see what you mean, but feel that users will benefit much more by 
using str(), which is a real gem!

Roger

  
    
#
On Sat, May 29, 2010 at 10:06 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:

            
They don't seem to work very well at all for Spatial*DataFrames. If I
add coordinates to meuse to get a SpatialPointsDataFrame and
head(that) I get all the 'rows' but with only the cadmium
measurements. It's slicing it the wrong way. Odd.
str is great if you need to know the str-ucture of an R object. But
it doesn't even align the values so you can see across rows of your
data, which is what I'd like print to do (by analogy with
print.data.frame).

 Currently if I print a SpatialPolygonsDataFrame I get the structure.
Print methods should do better than that - you're almost suggesting
not having, for example, a print method for data frames and that we'd
be better off having what print.default(anyDataFrame) gives us.

 So my proposal is that print of a SpatialPolygonsDataFrame class
should print like a data frame but with some indicator of the geometry
at the start of the row, such as POLYGON(...) - literally with dots,
there's no need to spell it out. Similarly for Lines.

 Another suggestion is for head() and tail methods on
Spatial*DataFrame objects - I think just subscripting [1:n,] from the
object and returning would do it. I think currently head and tail
treat these objects as lists and the results are not pretty.

Barry
#
On Sat, 29 May 2010, Barry Rowlingson wrote:

            
Right, because they see S4 objects as lists with no components, only with 
attributes. str() does have support for S4 objects. They would need to be 
wrapped around an S4 show/print method, with the output captured, as in 
capture.output(). Would it make sense to have the default print/show for 
Spatial* be str() with max.level= set, and for Spatial*DataFrame be the 
print method for the data slot prepended with some text (perhaps POINT, 
MULTILINESTRING, MULTIPOLYGON, PIXEL, CELL, or better an abbreviation)?

One would do this by cbind()ing the text in front of the as(, 
"data.frame"), I think, as a "geometry" variable.

Roger

  
    
#
Hello,

I am attempting to use the sample code in "Applied Spatial Data Analysis 
with R" but cannot get this to work and get this error:

 > nc = readShapePoly(system.file("shapes/sids.shp", package="maptools")[1],
+ IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
?????? read.dbf(filen) : unable to open DBF file

Any ideas?

Thanks,

Pete
#
On Sat, 29 May 2010, Peter Larson wrote:

            
Please update your installed packages - this looks like a mismatch between 
foreign and maptools.

Roger

  
    
#
On 05/29/2010 11:47 AM, Roger Bivand wrote:

            
In the following example:

require(maptools)
nc = readShapePoly(system.file("shapes/sids.shp", package =
"maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
+ellps=clrk66"))
str(as(nc, "SpatialPolygons"))
as(nc, "SpatialPolygons")

I personally find the output of the (current) print method producing
much easier readable than that of str. Partly because I've grown
accustomed to it, but also partly because I have never liked the output
of str. I tend to use the current default show method used for
SpatialLines* and SpatialPolygons* (the generic show for S4 objects) to
figure out what the structure of the data is, not how to use it. So I
guess for those who want to use the data without bothering about the
deeper structure, these print methods (both: current show.S4 and str)
are not so useful. If you disagree with this: please respond!

As for Barry's proposal, I find it a bit repetitive (and space
consuming) to have a POINT(1 1) instead of the current (1,1) (which,
credits where credits go, is from a package Barry wrote that preceded
sp). I can very well understand that many people will not know how to
read WKT [1], as it again is something that programmers tend to find
useful, not users; to be right we need the awfully long words
MULTILINESTRING and MULTIPOLYGON to represent the sp classes, and then
can't write the whole string but need to abbreviate. I agree with Barry
that a representation as much as possible like a data.frame is most useful.

I suggest the folloging: for points:

  geometry attr1 attr2 attr3
PT(234 45)   333   xxy  22.5
PT(455 68)   221   xxx  13.2

for polygons: use PN(3;2335) to express that this MULTIPOLYGON consists
of 3 POLYGONS, and has 2335 coordinates (in total)

  geometry attr1 attr2 attr3
PN(3;2335)   333   xxy  22.5
PN(45;345)   221   xxx  13.2

for lines:

  geometry attr1 attr2 attr3
LI(3;2335)   333   xxy  22.5
LI (5;345)   221   xxx  13.2

for pixels: use points, replace PT with PX

for grids: don't print all the values, but a very short summary.

To really educate users that we "glue" data.frame attribute tables to
geometries, they need to see this, and therefore I want to print a
SpatialPoints object as:

  geometry
PT(234 45)
PT(455 68)

and do the same for SpatialLines and SpatialPolygons:

  geometry
PN(3;2335)
PN(45;345)

  geometry
LI(3;2335)
LI (5;345)

what head and tail should do is then obvious.

Next thing is that developers/programmers need to find out how to print
all the gory details -- they will need to use str(nc) or show(unclass(nc)).

For those from Europe: thank you for all the points in the song contest.
We also like Lena a lot, here at home.

[1] http://en.wikipedia.org/wiki/Well-known_text