Removing duplicates

From: Matthew Hagerty <matthew(at)brwholesale(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Removing duplicates
Date: 2002-02-26 15:10:12
Message-ID: 5.1.0.14.2.20020226095955.00b17f10@imap.brwholesale.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Greetings,

I have a customer database (name, address1, address2, city, state, zip) and
I need a query (or two) that will give me a mailing list with the least
amount of duplicates possible. I know that precise matching is not
possible, i.e. "P.O. Box 123" will never match "PO Box 123" without some
data massaging, but if I can isolate even 50% of any duplicates, that would
help greatly.

Also, any suggestions on which parameters to check the duplicates for? My
first thoughts were to make sure there were no two addresses the same in
the same zip code. Any insight (or examples) would be greatly appreciated.

Thank you,
Matthew

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Andrew Perrin 2002-02-26 15:44:53 Re: Removing duplicates
Previous Message Christopher Kings-Lynne 2002-02-26 14:39:17 Re: Timestamp output