Re: extracting location info from string

From: Andrej <andrej(dot)groups(at)gmail(dot)com>
To: Tarlika Elisabeth Schmitz <postgresql3(at)numerixtechnology(dot)de>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: extracting location info from string
Date: 2011-05-25 22:15:50
Message-ID: BANLkTimBKGkhdzYCZqBR=pxa9bNhUBXWLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 26 May 2011 09:13, Tarlika Elisabeth Schmitz
<postgresql3(at)numerixtechnology(dot)de> wrote:
> On Wed, 25 May 2011 09:25:48 -0600
> Rob Sargent <robjsargent(at)gmail(dot)com> wrote:
>
>>
>>
>>On 05/24/2011 10:57 AM, Lew wrote:
>>> Tarlika Elisabeth Schmitz wrote:
>>>
>>>> CREATE TABLE person
>>>> (
>>>> id integer NOT NULL,
>>>> "name" character varying(256) NOT NULL,
>>>> "location" character varying(256),
>>>> CONSTRAINT person_pkey PRIMARY KEY (id)
>>>> );
>>>>
>>>> this was just a TEMPORARY table I created for quick analysis
>>>> of my CSV data (now renamed to temp_person).
>
> CREATE TABLE country
> (
>  id character varying(3) NOT NULL, -- alpha-3 code
>  "name" character varying(50) NOT NULL,
>  CONSTRAINT country_pkey PRIMARY KEY (id)
> );
>
>
>>To minimize the ultimately quite necessary human adjudication, one
>>might make good use of what is often termed "crowd sourcing":  Keep
>>all the distinct "hand entered" values and a map to the final human
>>assessment.
>
> I was wondering how to do just that. I don't think it would be a good
> idea to hard code this into the clean-up script. Take, for instance,
> variations of COUNTRY.NAME spelling. Where would I store these?
>
> I could do with a concept for this problem, which applies to a lot of
> string-type info.

I'd start w/ downloading a list as mentioned here:
http://answers.google.com/answers/threadview?id=596822

And run it through a wee perl script using
http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm
to make phonetic matches ...

Then I'd run your own data through DoubleMetaphone, and clean up
matches if not too many false positives show up.

Cheers,
Andrej

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Ozer, Pam 2011-05-25 22:18:20 Re: Sorting Issue
Previous Message Rob Sargent 2011-05-25 22:07:30 Re: extracting location info from string