Skip to content

Map digitization and classification

6 messages · Barry Rowlingson, Tim Sippel, Michael Sumner

#
On Fri, Mar 18, 2011 at 9:44 PM, tsippel <tsippel at gmail.com> wrote:
So you've got some high resolution page images and want to do
recognition of the shapes in the grid cells?

 Depending on how much noise there is it might be easy or difficult...
Any chance you can get us a sample image, or a section of one? How
many different symbols are there, and how big is the grid? I guess
global means 360x180, but what's the image resolution? Is there also a
map outline background to confuse things? MIght be easier to attack
this with an image processing toolbox like imageJ...

Barry
#
Hi Tim, wow that looks it could be rather difficult to automate. It's
probably easiest just to visualize them in a map plot and then use
locator() to recreate the locations for each symbol. That would not be
too difficult, but there are a few options.

Is the data not also published in the atlas in tabular form?

If I had to do this myself I'd probably use an interactive GIS like
Manifold, but it certainly could be done in R fairly simply with some
manual handling. The major problem is probably the overall accuracy
when you try to georegister the scans.

Cheers, Mike.
On Sat, Mar 19, 2011 at 11:28 AM, tsippel <tsippel at gmail.com> wrote:

  
    
#
I had another look and the georegistration should be pretty accurate
since there are so many grid lines.

I missed that the first time. If the images need significant warping
to get them regular again you could use control points with the GDAL
command line tools, which is probably easier than doing that in R. You
could still use R to generate the control points with locator though.
They might be good enough just with very simple registration though
(corner point and pixel scale).

Cheers, Mike.
On Sat, Mar 19, 2011 at 11:40 AM, Michael Sumner <mdsumner at gmail.com> wrote:

  
    
#
On Sat, Mar 19, 2011 at 12:44 AM, Michael Sumner <mdsumner at gmail.com> wrote:
There's at least two problems here to successful classification:

1. Finding the data grid - tricky because the warping of each image is
different. There's a lot of curvature and I think you'd need a lot of
control points. Alternatively you could look for things that looked
like + signs and infer the best grid from them, but that could be
hard.

 Might even be a patent on it: http://www.freepatentsonline.com/7479969.html

2. Detecting the feature. Also tricky. Some bits of the coastline will
look exactly like vertical bars. Where symbols partly clash with the
coastline they'll look different too.

Zooming right in on the image (1400% or so) shows each pen line to be
about four pixels, and either black or white (was it scanned in mono?)
so despeckling and thresholding might help shape detection. Scanning
in grayscale might be better.

I still think ImageJ might be a handy tool to start working on this. I
believe it has feature detection algorithms.

 Another idea would be to chop it up into the 10x10 grids and create a
job on Amazon's Mechanical Turk system, so real live human beings
would get paid for doing the classification.

 How many pages have you got? You might have to ask yourself if the
effort of coding something to do this would be more than the effort of
typing it all in manually.

 I guess we assume you've tried to find the original authors in order
for them to dig out the punch cards that this data was probably stored
on...

Barry