Map digitization and classification
On Sat, Mar 19, 2011 at 12:44 AM, Michael Sumner <mdsumner at gmail.com> wrote:
I had another look and the georegistration should be pretty accurate since there are so many grid lines.
There's at least two problems here to successful classification: 1. Finding the data grid - tricky because the warping of each image is different. There's a lot of curvature and I think you'd need a lot of control points. Alternatively you could look for things that looked like + signs and infer the best grid from them, but that could be hard. Might even be a patent on it: http://www.freepatentsonline.com/7479969.html 2. Detecting the feature. Also tricky. Some bits of the coastline will look exactly like vertical bars. Where symbols partly clash with the coastline they'll look different too. Zooming right in on the image (1400% or so) shows each pen line to be about four pixels, and either black or white (was it scanned in mono?) so despeckling and thresholding might help shape detection. Scanning in grayscale might be better. I still think ImageJ might be a handy tool to start working on this. I believe it has feature detection algorithms. Another idea would be to chop it up into the 10x10 grids and create a job on Amazon's Mechanical Turk system, so real live human beings would get paid for doing the classification. How many pages have you got? You might have to ask yourself if the effort of coding something to do this would be more than the effort of typing it all in manually. I guess we assume you've tried to find the original authors in order for them to dig out the punch cards that this data was probably stored on... Barry