Quick Links

Re: Slow duplicate deletes

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	DrYSG <ygutfreund(at)draper(dot)com>
Cc:	pgsql-novice(at)postgresql(dot)org
Subject:	Re: Slow duplicate deletes
Date:	2012-03-06 22:04:06
Message-ID:	CAHyXU0wD32XGpubZWVOmZE2PL0x2tpYiwSiCXpZf=6RoEz5Fcw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

On Mon, Mar 5, 2012 at 2:43 PM, DrYSG <ygutfreund(at)draper(dot)com> wrote:
> One point I might not have made clear. The reason I want to remove duplicates
> is that the column "data_object.unique_id" became non-unique (someone added
> duplicate rows). So I added the bigSeriel (idx) to uniquely identify the
> rows, and I was using the SELECT MIN(idx) and GroupBy to pick just one of
> the rows that became duplicated.
>
> I am going to try out some of your excellent suggestions. I will report back
> on how they are working.
>
> One idea that was given to me was the following (what do you think Merlin?)
>
> CREATE TABLE portal.new_metatdata AS
> select distinct on (data_object.unique_id) * FROM portal.metadata;

sure that will work, but as Michael noted it's not always practical to
do that. Also, if a fairly small percentage of the records have to be
deleted, an in-place delete may end up being faster anyways. Modern
postgres is pretty smart at optimizing 'where exists' and you should
get a decent plan.

merlin

In response to

Re: Slow duplicate deletes at 2012-03-05 20:43:47 from DrYSG

Responses

Re: Slow duplicate deletes at 2012-03-07 15:09:38 from DrYSG

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Steve Crawford	2012-03-06 23:55:08	Re: Database not browsable during COPY on PostgreSQL
Previous Message	Arda Çeşmecioğlu	2012-03-06 20:56:55	more than just (m:n)?