| From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
|---|---|
| To: | DrYSG <ygutfreund(at)draper(dot)com> |
| Cc: | pgsql-novice(at)postgresql(dot)org |
| Subject: | Re: Slow duplicate deletes |
| Date: | 2012-03-05 19:52:38 |
| Message-ID: | CAHyXU0ztqbzy3=cQiJDW+96mHzUz7WWKhqk00okvtp2jvQYC9A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-novice |
On Mon, Mar 5, 2012 at 9:17 AM, DrYSG <ygutfreund(at)draper(dot)com> wrote:
> I have a large table (20M records) but mostly short text fields. There are
> duplicates that I am trying to remove. I have a bigseriel index that I
> unique, but duplicates in another field.
>
> I have an 8 core, 12GB memory computer with RAID disks.
>
> This request has been running for 70 hours (is it safe to kill it?).
yes...generally speaking, it's safe to kill just about any query in
postgres any time.
> How can I make this run faster? This is a one time processing task, but it
> is taking a long time.
>
> DELETE FROM portal.metadata
> WHERE idx NOT IN
> (
> SELECT MIN(idx)
> FROM portal.metadata
> GROUP BY "data_object.unique_id"
> );
compare the plan for that query (EXPLAIN) vs this one:
/* delete the records from m1 if there is another record with a lower
idx for the same unique_id */
DELETE FROM portal.metadata m1
WHERE EXISTS
(
SELECT 1 FROM portal.metadata m2
WHERE m1.unique_id = m2.unique_id
AND m2.idx < m1.idx
)
also, if you don't already have one, consider making an index on at
least unqiue_id, or possibly unique_id, idx.
back up your database before running this query :-).
merlin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | DrYSG | 2012-03-05 20:43:47 | Re: Slow duplicate deletes |
| Previous Message | Daniel Staal | 2012-03-05 17:41:33 | Re: initDB - storage manager issues |