Re: High cost of ... where ... not in (select ...)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Aaron Turner <synfinatic(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: High cost of ... where ... not in (select ...)
Date: 2009-06-17 00:30:02
Message-ID: 603c8f070906161730o4b577483o2a90c73549c3990e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turner<synfinatic(at)gmail(dot)com> wrote:
> On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
> Herrera<alvherre(at)commandprompt(dot)com> wrote:
>> Aaron Turner escribió:
>>> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze):
>>>
>>> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
>>> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);
>>
>> What PG version is this?
>
> Doh, just realized I didn't reply back to list.   It's version 8.3.3.
>
> Also, pcap_storeid is unique in pcap_store_log

Speaking as one who has dealt with this frustration more than once,
you can typically get better performance with something like:

DELETE FROM muapp.pcap_store AS x
FROM muapp.pcap_store a
LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL

This is emphatically lame, but there you have it. It's first of all
lame that we can't do a better job optimizing NOT-IN, at least when
the expression within the subselect is known to be not-null, and it's
secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN
without a self-JOIN.

</rant>

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Aaron Turner 2009-06-17 01:23:45 Re: High cost of ... where ... not in (select ...)
Previous Message Aaron Turner 2009-06-16 23:39:15 Re: High cost of ... where ... not in (select ...)