Re: [Qgis-user] OCR

From: Mohlomi Moloi <mmoloi(at)khulisa(dot)com>
To: Ramon Andinach <custard(at)westnet(dot)com(dot)au>, ALT SHN <i(dot)geografica(at)alt-shn(dot)org>
Cc: grass-user(at)lists(dot)osgeo(dot)org, qgis-user(at)lists(dot)osgeo(dot)org, pgsql-novice(at)postgresql(dot)org
Subject: Re: [Qgis-user] OCR
Date: 2011-05-04 11:20:56
Message-ID: 46107202.6411304508056816.JavaMail.root@scalix.khulisa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

----- Original Message -----
From: "Ramon Andinach" <custard(at)westnet(dot)com(dot)au>
Sent: Wed, 5/4/2011 12:04pm
To: "ALT SHN" <i(dot)geografica(at)alt-shn(dot)org>
Cc: grass-user(at)lists(dot)osgeo(dot)org ; qgis-user(at)lists(dot)osgeo(dot)org ; pgsql-novice(at)postgresql(dot)org
Subject: Re: [NOVICE] [Qgis-user] OCR

On 04/05/2011, at 17:18 , ALT SHN wrote:

> Hello,
>
> This might seem a Little off topic, but maybe someone here can help me.
>
> I need to extract toponomical data from old digitized paper maps. I wish to explore Optical character recognition (OCR).
>
> Does anyone has a suggestion/experience with this kind of challenge?
>
> Thank you,
>
> André Mano

I'd argue about the "little". I've spent a lot of the last few weeks arguing with OCR software about tables of sample data from old reports, that become point data to plot. Perfectly relevant :)

I've never tried getting data from digitized maps, but I'll offer the following generalisations in case it helps. Generally, I have OCR programmes pass me the results as plain text. I lose the formatting, but I don't have to fix stupid guesses about the formatting.

This is from my experience, so you may find different.

1. OCR loves paragraphs.
2. Different OCR programmes handle column text differently. Some understand columns, some just assume L->R straight across both columns.
3. OCR does not get along with handwritten anything. (Unless the person was extra-extra neat and consistent in their writing, and even then it's a maybe.)
4. OCR on tabular data works best if the data is lined up in columns, and doesn't have random big gaps.
5. OCR will almost certainly be confused if there is a line on your map running through or near a word.
5a. Actually lines could confuse it quite a bit - I remember one that tried to recreate an in-line sketch map out of ascii characters. Quite amusing.
6. You *will* need to check the results.

I'd love to hear what OCR makes of maps. Very curious.

-ramon.
--
Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Hi List,

I have used TELEForm application (not a cheap app to buy) to capture any digitized information (especially large surveys) - so there's a process to follow in order to get that data in any database (as I populate data in postgreSQL.) You will need to convert whatever forms you have to TELEForm forms! I believe a geodatabase can be created via TeleForm capture application.

Browse pgsql-novice by date

  From Date Subject
Next Message Mike Ellsworth 2011-05-04 11:24:13 Re: [Qgis-user] OCR
Previous Message Ramon Andinach 2011-05-04 10:04:05 Re: [Qgis-user] OCR