Re: NOT IN subquery optimization

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Li, Zheng" <zhelli(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <riguo(at)pivotal(dot)io>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NOT IN subquery optimization
Date: 2019-03-01 23:16:26
Message-ID: CAKJS1f9DfW0PFYzf1hw_PzkkbEVDhzqDg1xJkgSyBd+0T79jHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Li, Zheng" <zhelli(at)amazon(dot)com> writes:
> > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it is always faster compared to the original plan.
>
> TBH, I am *really* skeptical of sweeping claims like that. The existing
> code will typically produce a hashed-subplan plan, which ought not be
> that awful as long as the subquery result doesn't blow out memory.
> It certainly is going to beat a naive nested loop.

It's pretty easy to show the claim is false using master and NOT EXISTS.

create table small(a int not null);
create table big (a int not null);
insert into small select generate_Series(1,1000);
insert into big select x%1000+1 from generate_Series(1,1000000) x;

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 178.575 ms

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a or s.a is null);
Time: 38049.969 ms (00:38.050)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-03-01 23:19:33 Re: [HACKERS] Incomplete startup packet errors
Previous Message Tom Lane 2019-03-01 23:13:56 Re: NOT IN subquery optimization