| From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | "Li, Zheng" <zhelli(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <riguo(at)pivotal(dot)io>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: NOT IN subquery optimization | 
| Date: | 2019-03-01 23:16:26 | 
| Message-ID: | CAKJS1f9DfW0PFYzf1hw_PzkkbEVDhzqDg1xJkgSyBd+0T79jHg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sat, 2 Mar 2019 at 12:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Li, Zheng" <zhelli(at)amazon(dot)com> writes:
> > Although adding "or var is NULL" to the anti join condition forces the planner to choose nested loop anti join, it is always faster compared to the original plan.
>
> TBH, I am *really* skeptical of sweeping claims like that.  The existing
> code will typically produce a hashed-subplan plan, which ought not be
> that awful as long as the subquery result doesn't blow out memory.
> It certainly is going to beat a naive nested loop.
It's pretty easy to show the claim is false using master and NOT EXISTS.
create table small(a int not null);
create table big (a int not null);
insert into small select generate_Series(1,1000);
insert into big select x%1000+1 from generate_Series(1,1000000) x;
select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 178.575 ms
select count(*) from big b where not exists(select 1 from small s
where s.a = b.a or s.a is null);
Time: 38049.969 ms (00:38.050)
-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2019-03-01 23:19:33 | Re: [HACKERS] Incomplete startup packet errors | 
| Previous Message | Tom Lane | 2019-03-01 23:13:56 | Re: NOT IN subquery optimization |