An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20110318/84c1d6ed/attachment.pl>
Map digitization and classification
6 messages · Barry Rowlingson, Tim Sippel, Michael Sumner
On Fri, Mar 18, 2011 at 9:44 PM, tsippel <tsippel at gmail.com> wrote:
I have a series of scanned global maps (from a bound Atlas) of oceanographic sampling effort that I would like to classify. On 1 x1 lat/lon grids are symbols that represent sampling density. ?I need to read in these scanned maps, and classify the symbols (squares are classified as 1, triangles as 2, etc.) and hopefully store them in raster grid files (.asc) for analysis. After some trawling through CRAN looking for packages for this, it is not yet apparent to me which is best. ?I've considered using a GIS for this (ie. GRASS), but I'm not sure if that is a more complicated solution to a simpler problem. ?I need to process a directory of these images, so scripting the process to loop over the directory would be ideal.
So you've got some high resolution page images and want to do recognition of the shapes in the grid cells? Depending on how much noise there is it might be easy or difficult... Any chance you can get us a sample image, or a section of one? How many different symbols are there, and how big is the grid? I guess global means 360x180, but what's the image resolution? Is there also a map outline background to confuse things? MIght be easier to attack this with an image processing toolbox like imageJ... Barry
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20110318/bc95bac1/attachment.pl>
Hi Tim, wow that looks it could be rather difficult to automate. It's probably easiest just to visualize them in a map plot and then use locator() to recreate the locations for each symbol. That would not be too difficult, but there are a few options. Is the data not also published in the atlas in tabular form? If I had to do this myself I'd probably use an interactive GIS like Manifold, but it certainly could be done in R fairly simply with some manual handling. The major problem is probably the overall accuracy when you try to georegister the scans. Cheers, Mike.
On Sat, Mar 19, 2011 at 11:28 AM, tsippel <tsippel at gmail.com> wrote:
Here is an example of one of these map sets. ?This is the original as it was sent to me, but I would crop each map individually. https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc <https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc>In my original post I asked about classifying squares and triangle for the sake of simplicity, but as you can see the symbols used for this (published in the 1970s) aren't that easy to distinguish. ?Luckily, I don't think there is too much background noise, but maybe the 10 x10 grid lines will be problematic? Since this was scanned from the pages of a bound book, a curvature was induced in the scanned copy too, so the borders aren't perfectly square. ?We might be able to resolve this by scanning things more carefully, but your thoughts on this are very welcome too. Thanks, Tim On Fri, Mar 18, 2011 at 2:04 PM, Barry Rowlingson < b.rowlingson at lancaster.ac.uk> wrote:
On Fri, Mar 18, 2011 at 9:44 PM, tsippel <tsippel at gmail.com> wrote:
I have a series of scanned global maps (from a bound Atlas) of
oceanographic
sampling effort that I would like to classify. On 1 x1 lat/lon grids are symbols that represent sampling density. ?I need to read in these scanned maps, and classify the symbols (squares are classified as 1, triangles as
2,
etc.) and hopefully store them in raster grid files (.asc) for analysis. After some trawling through CRAN looking for packages for this, it is not yet apparent to me which is best. ?I've considered using a GIS for this
(ie.
GRASS), but I'm not sure if that is a more complicated solution to a
simpler
problem. ?I need to process a directory of these images, so scripting the process to loop over the directory would be ideal.
?So you've got some high resolution page images and want to do recognition of the shapes in the grid cells? ?Depending on how much noise there is it might be easy or difficult... Any chance you can get us a sample image, or a section of one? How many different symbols are there, and how big is the grid? I guess global means 360x180, but what's the image resolution? Is there also a map outline background to confuse things? MIght be easier to attack this with an image processing toolbox like imageJ... Barry
? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsumner at gmail.com
I had another look and the georegistration should be pretty accurate since there are so many grid lines. I missed that the first time. If the images need significant warping to get them regular again you could use control points with the GDAL command line tools, which is probably easier than doing that in R. You could still use R to generate the control points with locator though. They might be good enough just with very simple registration though (corner point and pixel scale). Cheers, Mike.
On Sat, Mar 19, 2011 at 11:40 AM, Michael Sumner <mdsumner at gmail.com> wrote:
Hi Tim, wow that looks it could be rather difficult to automate. It's probably easiest just to visualize them in a map plot and then use locator() to recreate the locations for each symbol. That would not be too difficult, but there are a few options. Is the data not also published in the atlas in tabular form? If I had to do this myself I'd probably use an interactive GIS like Manifold, but it certainly could be done in R fairly simply with some manual handling. The major problem is probably the overall accuracy when you try to georegister the scans. Cheers, Mike. On Sat, Mar 19, 2011 at 11:28 AM, tsippel <tsippel at gmail.com> wrote:
Here is an example of one of these map sets. ?This is the original as it was sent to me, but I would crop each map individually. https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc <https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc>In my original post I asked about classifying squares and triangle for the sake of simplicity, but as you can see the symbols used for this (published in the 1970s) aren't that easy to distinguish. ?Luckily, I don't think there is too much background noise, but maybe the 10 x10 grid lines will be problematic? Since this was scanned from the pages of a bound book, a curvature was induced in the scanned copy too, so the borders aren't perfectly square. ?We might be able to resolve this by scanning things more carefully, but your thoughts on this are very welcome too. Thanks, Tim On Fri, Mar 18, 2011 at 2:04 PM, Barry Rowlingson < b.rowlingson at lancaster.ac.uk> wrote:
On Fri, Mar 18, 2011 at 9:44 PM, tsippel <tsippel at gmail.com> wrote:
I have a series of scanned global maps (from a bound Atlas) of
oceanographic
sampling effort that I would like to classify. On 1 x1 lat/lon grids are symbols that represent sampling density. ?I need to read in these scanned maps, and classify the symbols (squares are classified as 1, triangles as
2,
etc.) and hopefully store them in raster grid files (.asc) for analysis. After some trawling through CRAN looking for packages for this, it is not yet apparent to me which is best. ?I've considered using a GIS for this
(ie.
GRASS), but I'm not sure if that is a more complicated solution to a
simpler
problem. ?I need to process a directory of these images, so scripting the process to loop over the directory would be ideal.
?So you've got some high resolution page images and want to do recognition of the shapes in the grid cells? ?Depending on how much noise there is it might be easy or difficult... Any chance you can get us a sample image, or a section of one? How many different symbols are there, and how big is the grid? I guess global means 360x180, but what's the image resolution? Is there also a map outline background to confuse things? MIght be easier to attack this with an image processing toolbox like imageJ... Barry
? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsumner at gmail.com
Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsumner at gmail.com
On Sat, Mar 19, 2011 at 12:44 AM, Michael Sumner <mdsumner at gmail.com> wrote:
I had another look and the georegistration should be pretty accurate since there are so many grid lines.
There's at least two problems here to successful classification: 1. Finding the data grid - tricky because the warping of each image is different. There's a lot of curvature and I think you'd need a lot of control points. Alternatively you could look for things that looked like + signs and infer the best grid from them, but that could be hard. Might even be a patent on it: http://www.freepatentsonline.com/7479969.html 2. Detecting the feature. Also tricky. Some bits of the coastline will look exactly like vertical bars. Where symbols partly clash with the coastline they'll look different too. Zooming right in on the image (1400% or so) shows each pen line to be about four pixels, and either black or white (was it scanned in mono?) so despeckling and thresholding might help shape detection. Scanning in grayscale might be better. I still think ImageJ might be a handy tool to start working on this. I believe it has feature detection algorithms. Another idea would be to chop it up into the 10x10 grids and create a job on Amazon's Mechanical Turk system, so real live human beings would get paid for doing the classification. How many pages have you got? You might have to ask yourself if the effort of coding something to do this would be more than the effort of typing it all in manually. I guess we assume you've tried to find the original authors in order for them to dig out the punch cards that this data was probably stored on... Barry