Re: How to find double entries

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas <maps(dot)on(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: How to find double entries
Date: 2008-04-16 03:23:32
Message-ID: 21481.1208316212@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Andreas <maps(dot)on(at)gmx(dot)net> writes:
> I'd like to identify and then merge records of e.g. 'google', 'gogle',
> 'guugle'

> Then I want to match abbrevations like 'A-Company Ltd.', 'a company
> ltd.', 'A-Company Limited'

> Is there a way to do this?
> It would be OK just to list candidats up to be manually checked afterwards.

There are some functions in contrib/fuzzystrmatch that seem like they'd
help you find candidate duplicates. contrib/pg_trgm and text search
might also offer promising tools.

What's really a duplicate sounds like a judgment call here, so you
probably shouldn't even think of automating it completely.

regards, tom lane

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Craig Ringer 2008-04-16 04:22:57 Re: How to find double entries
Previous Message Andreas 2008-04-16 03:15:43 How to find double entries