Re: A DISTINCT problem removing duplicates

From: Richard Huxton <dev(at)archonet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL SQL List <pgsql-sql(at)postgresql(dot)org>
Subject: Re: A DISTINCT problem removing duplicates
Date: 2008-12-09 15:04:48
Message-ID: 493E8910.7010007@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Tom Lane wrote:
> Richard Huxton <dev(at)archonet(dot)com> writes:
>> Anyone got anything more elegant?
>
> Seems to me that no document should have an empty dup_set. If it's not
> a match to any existing document, then immediately assign a new dup_set
> number to it.

That was my initial thought too, but it means when I actually find a
duplicate I have to decide which "direction" to renumber them in. It
also means probably keeping a summary table with counts to show which
are duplicates, since the duplicates table is now the same size as the
documents table.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2008-12-09 15:39:29 Re: A DISTINCT problem removing duplicates
Previous Message Tom Lane 2008-12-09 13:50:47 Re: A DISTINCT problem removing duplicates