Re: How to find double entries

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Vivek Khera <vivek(at)khera(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: How to find double entries
Date: 2008-04-16 15:49:23
Message-ID: 48062003.3050409@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Vivek Khera wrote:
>
> On Apr 15, 2008, at 11:23 PM, Tom Lane wrote:
>> What's really a duplicate sounds like a judgment call here, so you
>> probably shouldn't even think of automating it completely.
>
> I did a consulting gig about 10 years ago for a company that made
> software to normalize street addresses and names. Literally dozens of
> people worked there, and that was their primary software product. It is
> definitely not a trivial task, as the rules can be extremely complex.

From what little I've personally seen of others' addressing handling,
some (many/most?) people who blindly advocate full normalisation of
addresses either:

(a) only care about a rather restricted set of address types ("ordinary
residential addresses in <my country>", though that can be bad enough);
or
(b) don't know how horrible addressing is .... yet ... and are going to
find out soon when their highly normalized addressing schema proves
incapable of representing some address they've just been presented with.

with most probably falling into the second category.

Overly strict addressing, without the associated fairly extreme
development effort to get it even vaguely right, seems to lead to users
working around the broken addressing schema by entering bogus data.

Personally I'm content to provide lots of space for user-formatted
addresses, only breaking out separate fields for the post code
(Australian only), the city/suburb, the state, and the country - all
stored as strings. The only DB level validation is a rule preventing the
entry of invalid & undefined postcodes for Australian addresses, and
preventing the entry of invalid Australian states. The app is used
almost entirely with Australian addresses, and there's a definitive, up
to date list of australian post codes available from the postal
services, so it's worth a little more checking to protect against basic
typos and misunderstandings.

The app provides some more help at the UI level for users, such as
automatically filling in the state and suburb if an Australian post code
is entered. It'll warn you if you enter an unknown Australian
suburb/city for an entry in Australia. For everything else I leave it to
the user and to possible later validation and reporting.

I've had good results with this policy when working with other apps that
need to handle addressing information, and I've had some truly horrible
experiences with apps that try to be too strict in their address checking.

--
Craig Ringer

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Mina R Waheeb 2008-04-16 18:34:43 SQL/XML Multi table join question
Previous Message Osvaldo Rosario Kussama 2008-04-16 15:09:01 Re: Data Comparison Single Table Question