Quick Links

Re: NOT IN vs. NOT EXISTS performance

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	david(dot)rowley(at)2ndquadrant(dot)com
Cc:	lswainemoore(at)gmail(dot)com, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: NOT IN vs. NOT EXISTS performance
Date:	2018-11-09 13:45:56
Message-ID:	CAHyXU0xqvzCt=cRc2+D4dTtWA1C4GmjvK=DPq3_db3FTOoGjcA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thu, Nov 8, 2018 at 3:12 PM David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
> On 9 November 2018 at 08:35, Lincoln Swaine-Moore
> <lswainemoore(at)gmail(dot)com> wrote:
> > My primary question is: why is this approach only possible (for data too
> > large for memory) when using NOT EXISTS, and not when using NOT IN?
> >
> > I understand that there is a slight difference in the meaning of the two
> > expressions, in that NOT IN will produce NULL if there are any NULL values
> > in the right hand side (in this case there are none, and the queries should
> > return the same COUNT). But if anything, I would expect that to improve
> > performance of the NOT IN operation, since a single pass through that data
> > should reveal if there are any NULL values, at which point that information
> > could be used to short-circuit. So I am a bit baffled.
>
> The problem is that the planner makes the plan and would have to know
> beforehand that no NULLs could exist on either side of the join.

Yeah, the core issue is the SQL rules that define NOT IN behaves as:
postgres=# select 1 not in (select 2);
?column?
──────────
t
(1 row)

postgres=# select 1 not in (select 2 union all select null);
?column?
──────────

(1 row)

There's a certain logic to it but it's a death sentence for performance.

merlin

In response to

Re: NOT IN vs. NOT EXISTS performance at 2018-11-08 21:11:40 from David Rowley

Responses

Re: NOT IN vs. NOT EXISTS performance at 2018-11-10 00:06:15 from Lincoln Swaine-Moore

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Lincoln Swaine-Moore	2018-11-10 00:06:15	Re: NOT IN vs. NOT EXISTS performance
Previous Message	David Rowley	2018-11-08 21:11:40	Re: NOT IN vs. NOT EXISTS performance