Quick Links

Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets

From:	Tom Molesworth <tom(at)audioboundary(dot)com>
To:	Ole Tange <ole(at)tange(dot)dk>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets
Date:	2009-07-28 22:16:37
Message-ID:	4A6F78C5.9070607@audioboundary.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Ole Tange wrote:
> On Tue, Jul 28, 2009 at 3:47 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> "Ole Tange" <postgresql(dot)org(at)tange(dot)dk> writes:
>>
>>> (modulo NULLs which seem to always cause problems in NOT INs).
>>>
>>> Because it can be rewritten, NOT IN should never be much slower than the
>>> rewritten solution, as PostgreSQL should simply rewrite NOT IN to the
>>> above.
>>>
>> Let's see, you understand that the rewrite violates the SQL standard
>> semantics of NOT IN, but you think we should do it anyway?
>>
>
> Thanks for your kind reply.
>
> Apparently my bug report was not quite clear. My rewrite example was
> simply to show a way that could do a marvelous speedup on medium to
> large sets. The correct dealing with NULL I am sure can be handled
> just as efficiently.
>

Just an observation - it seems that you're using NOT IN() and expecting
it to do the same as this:

SELECT foo FROM a LEFT JOIN b ON a.key = b.key WHERE b.key IS NULL

I find it's comparatively rare to actually want the NOT IN()
null-handling semantics, especially given your comment about nulls
'causing problems' (although I guess you could get the same behaviour in
that query as NOT IN, by adding 'AND a.key IS NOT NULL' to the where
clause - I have no idea whether this is something the optimiser could or
should be able to do).

Although it refers to TSQL, this page might help explain more about NOT IN:

http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values

Not sure if constructions such as 'not exists (select 1 from b where
b.key = a.key)' also have the same performance issues you describe -
should be faster due to the key? - but I find the left join approach
works well enough up to several million rows in the two tables. An
unindexed subquery like 'select key from a' just seems like a guaranteed
way to invoking a full (slow) table scan.

Tom

In response to

Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets at 2009-07-28 20:57:10 from Ole Tange

Responses

Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets at 2009-07-29 10:34:21 from Ole Tange

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Jim Michaels	2009-07-28 22:57:50	BUG #4951: installation dir wrong for libpq compilation
Previous Message	Ole Tange	2009-07-28 20:57:10	Re: BUG #4949: NOT IN is prohibitive slower than the rewrite for medium to large sets