Re: Deleting orphaned records (not exists is very slow)

From: Tim Uckun <timuckun(at)gmail(dot)com>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: Michael Glaesemann <grzm(at)seespotcode(dot)net>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deleting orphaned records (not exists is very slow)
Date: 2010-09-28 02:08:21
Message-ID: AANLkTingK8S9ezuVzvOtEa1rhLZ6KDdCLAO=v0eW+23C@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> The other thing that just hit my mind, is that you mind need to bump
> up work_mem a bit, just for the session. so
> SET work_mem=32M
> <your query>
>
> and that should make it slightly faster.

I tried the method suggested. I created a table of IDs and a field
called "mark". I indexed both fields.

Then I did the following.

update to_be_deleted set mark = true where ctid = any (array( select
ctid from to_be_deleted limit 10));

Followed by ....

DELETE FROM table WHERE (id in (select id from to_be_deleted where
mark = true))

This query took an extremely long time. I stopped it after about
fifteen minutes which seems outrageous to me because it's only trying
to delete ten records.

In the end I wrote a ruby script that does this.

loop do
break if (to_be_deleted = ToBeDeleted.limit(10).map{|t| t.id}).size == 0
ids = to_be_deleted.join(',')
SearchResult.delete_all "id in (#{ids})"
ToBeDeleted.delete_all "id in (#{ids})"
@logger.debug "Deleted #{ids}"
end

This is running now. It's running reasonably fast. I presume it will
keep getting faster as the number or records on both tables keep
getting smaller.

Honestly there was no need for any of this. I can't believe I just
wasted a couple of hours trying to get this to go only to resort to
writing a ruby script.

The original query I had written was....

delete from
table_name
where id in
in
(SELECT id
FROM table_name tb
LEFT OUTER JOIN other_table ot ON tb.id = ot.table_name_id
WHERE ot.id Is Null)

This should have "just worked" but in this case I would estimate it
would take a couple of months given the number of records in the
database.

Thanks for the advice but man what a hassle.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Satoshi Nagayasu 2010-09-28 02:38:44 Re: ECPG - Some errno definitions don't match to the manual
Previous Message John R Pierce 2010-09-28 01:56:00 Re: Query to get the "next available" unique suffix for a name