Map digitization and classification

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20110318/84c1d6ed/attachment.pl>
I have a series of scanned global maps (from a bound Atlas) of oceanographic
sampling effort that I would like to classify. On 1 x1 lat/lon grids are
symbols that represent sampling density. ?I need to read in these scanned
maps, and classify the symbols (squares are classified as 1, triangles as 2,
etc.) and hopefully store them in raster grid files (.asc) for analysis.

After some trawling through CRAN looking for packages for this, it is not
yet apparent to me which is best. ?I've considered using a GIS for this (ie.
GRASS), but I'm not sure if that is a more complicated solution to a simpler
problem. ?I need to process a directory of these images, so scripting the
process to loop over the directory would be ideal.
So you've got some high resolution page images and want to do
recognition of the shapes in the grid cells?

 Depending on how much noise there is it might be easy or difficult...
Any chance you can get us a sample image, or a section of one? How
many different symbols are there, and how big is the grid? I guess
global means 360x180, but what's the image resolution? Is there also a
map outline background to confuse things? MIght be easier to attack
this with an image processing toolbox like imageJ...

Barry
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20110318/bc95bac1/attachment.pl>
Hi Tim, wow that looks it could be rather difficult to automate. It's
probably easiest just to visualize them in a map plot and then use
locator() to recreate the locations for each symbol. That would not be
too difficult, but there are a few options.

Is the data not also published in the atlas in tabular form?

If I had to do this myself I'd probably use an interactive GIS like
Manifold, but it certainly could be done in R fairly simply with some
manual handling. The major problem is probably the overall accuracy
when you try to georegister the scans.

Cheers, Mike.
Here is an example of one of these map sets. ?This is the original as it was
sent to me, but I would crop each map individually.

https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc

<https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc>In
my original post I asked about classifying squares and triangle for the sake
of simplicity, but as you can see the symbols used for this (published in
the 1970s) aren't that easy to distinguish. ?Luckily, I don't think there is
too much background noise, but maybe the 10 x10 grid lines will be
problematic?

Since this was scanned from the pages of a bound book, a curvature was
induced in the scanned copy too, so the borders aren't perfectly square. ?We
might be able to resolve this by scanning things more carefully, but your
thoughts on this are very welcome too.

Thanks,

Tim

On Fri, Mar 18, 2011 at 2:04 PM, Barry Rowlingson <
b.rowlingson at lancaster.ac.uk> wrote:

On Fri, Mar 18, 2011 at 9:44 PM, tsippel <tsippel at gmail.com> wrote:
I have a series of scanned global maps (from a bound Atlas) of
oceanographic
sampling effort that I would like to classify. On 1 x1 lat/lon grids are
symbols that represent sampling density. ?I need to read in these scanned
maps, and classify the symbols (squares are classified as 1, triangles as
2,
etc.) and hopefully store them in raster grid files (.asc) for analysis.

After some trawling through CRAN looking for packages for this, it is not
yet apparent to me which is best. ?I've considered using a GIS for this
(ie.
GRASS), but I'm not sure if that is a more complicated solution to a
simpler
problem. ?I need to process a directory of these images, so scripting the
process to loop over the directory would be ideal.
?So you've got some high resolution page images and want to do
recognition of the shapes in the grid cells?

?Depending on how much noise there is it might be easy or difficult...
Any chance you can get us a sample image, or a section of one? How
many different symbols are there, and how big is the grid? I guess
global means 360x180, but what's the image resolution? Is there also a
map outline background to confuse things? MIght be easier to attack
this with an image processing toolbox like imageJ...

Barry

? ? ? ?[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsumner at gmail.com
I had another look and the georegistration should be pretty accurate
since there are so many grid lines.

I missed that the first time. If the images need significant warping
to get them regular again you could use control points with the GDAL
command line tools, which is probably easier than doing that in R. You
could still use R to generate the control points with locator though.
They might be good enough just with very simple registration though
(corner point and pixel scale).

Cheers, Mike.
Hi Tim, wow that looks it could be rather difficult to automate. It's
probably easiest just to visualize them in a map plot and then use
locator() to recreate the locations for each symbol. That would not be
too difficult, but there are a few options.

Is the data not also published in the atlas in tabular form?

If I had to do this myself I'd probably use an interactive GIS like
Manifold, but it certainly could be done in R fairly simply with some
manual handling. The major problem is probably the overall accuracy
when you try to georegister the scans.

Cheers, Mike.

On Sat, Mar 19, 2011 at 11:28 AM, tsippel <tsippel at gmail.com> wrote:
Here is an example of one of these map sets. ?This is the original as it was
sent to me, but I would crop each map individually.

https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc

<https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B0d3zfSSPFQsY2MxODEyZWEtZTRkZC00OTk2LTgwY2YtYTZkYzcwZGYxZDll&hl=en&authkey=CLmWvWc>In
my original post I asked about classifying squares and triangle for the sake
of simplicity, but as you can see the symbols used for this (published in
the 1970s) aren't that easy to distinguish. ?Luckily, I don't think there is
too much background noise, but maybe the 10 x10 grid lines will be
problematic?

Since this was scanned from the pages of a bound book, a curvature was
induced in the scanned copy too, so the borders aren't perfectly square. ?We
might be able to resolve this by scanning things more carefully, but your
thoughts on this are very welcome too.

Thanks,

Tim

On Fri, Mar 18, 2011 at 2:04 PM, Barry Rowlingson <
b.rowlingson at lancaster.ac.uk> wrote:

On Fri, Mar 18, 2011 at 9:44 PM, tsippel <tsippel at gmail.com> wrote:
I have a series of scanned global maps (from a bound Atlas) of
oceanographic
sampling effort that I would like to classify. On 1 x1 lat/lon grids are
symbols that represent sampling density. ?I need to read in these scanned
maps, and classify the symbols (squares are classified as 1, triangles as
2,
etc.) and hopefully store them in raster grid files (.asc) for analysis.

After some trawling through CRAN looking for packages for this, it is not
yet apparent to me which is best. ?I've considered using a GIS for this
(ie.
GRASS), but I'm not sure if that is a more complicated solution to a
simpler
problem. ?I need to process a directory of these images, so scripting the
process to loop over the directory would be ideal.
?So you've got some high resolution page images and want to do
recognition of the shapes in the grid cells?

?Depending on how much noise there is it might be easy or difficult...
Any chance you can get us a sample image, or a section of one? How
many different symbols are there, and how big is the grid? I guess
global means 360x180, but what's the image resolution? Is there also a
map outline background to confuse things? MIght be easier to attack
this with an image processing toolbox like imageJ...

Barry

? ? ? ?[[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsumner at gmail.com

Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsumner at gmail.com
I had another look and the georegistration should be pretty accurate
since there are so many grid lines.
There's at least two problems here to successful classification:

1. Finding the data grid - tricky because the warping of each image is
different. There's a lot of curvature and I think you'd need a lot of
control points. Alternatively you could look for things that looked
like + signs and infer the best grid from them, but that could be
hard.

 Might even be a patent on it: http://www.freepatentsonline.com/7479969.html

2. Detecting the feature. Also tricky. Some bits of the coastline will
look exactly like vertical bars. Where symbols partly clash with the
coastline they'll look different too.

Zooming right in on the image (1400% or so) shows each pen line to be
about four pixels, and either black or white (was it scanned in mono?)
so despeckling and thresholding might help shape detection. Scanning
in grayscale might be better.

I still think ImageJ might be a handy tool to start working on this. I
believe it has feature detection algorithms.

 Another idea would be to chop it up into the 10x10 grids and create a
job on Amazon's Mechanical Turk system, so real live human beings
would get paid for doing the classification.

 How many pages have you got? You might have to ask yourself if the
effort of coding something to do this would be more than the effort of
typing it all in manually.

 I guess we assume you've tried to find the original authors in order
for them to dig out the punch cards that this data was probably stored
on...

Barry