Re: NOT IN subquery optimization

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: "Li, Zheng" <zhelli(at)amazon(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <riguo(at)pivotal(dot)io>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NOT IN subquery optimization
Date: 2019-03-02 00:11:46
Message-ID: CAKJS1f8q4S+5Z7WSRDWJd__SwqMr12JdWKXTDo35ptzneRvZnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2 Mar 2019 at 12:39, Li, Zheng <zhelli(at)amazon(dot)com> wrote:
> However, if s.a is nullable, we would do this transformation:
> select count(*) from big b where not exists(select 1 from small s
> where s.a = b.a or s.a is null);

I understand you're keen to make this work, but you're assuming again
that forcing the planner into a nested loop plan is going to be a win
over the current behaviour. It may well be in some cases, but it's
very simple to show cases where it's a significant regression.

Using the same tables from earlier, and again with master:

alter table small alter column a drop not null;
select * from big where a not in(select a from small);
Time: 430.283 ms

Here's what you're proposing:

select * from big b where not exists(select 1 from small s where s.a =
b.a or s.a is null);
Time: 37419.646 ms (00:37.420)

about 80 times slower. Making "small" a little less small would likely
see that gap grow even further.

I think you're fighting a losing battle here with adding OR quals to
the join condition. This transformation happens so early in planning
that you really can't cost it out either. I think the only way that
could be made to work satisfactorily would be with some execution
level support for it. Short of that, you're left with just adding
checks that either side of the join cannot produce NULL values...
That's what I've proposed in [1].

[1] https://www.postgresql.org/message-id/CAKJS1f_OA5VeZx8A8H8mkj3uqEgOtmHBGCUA6%2BxqgmUJ6JQURw%40mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-02 00:40:22 Re: Tighten error control for OpenTransientFile/CloseTransientFile
Previous Message Peter Geoghegan 2019-03-01 23:59:29 VACUUM can finish an interrupted nbtree page split -- is that okay?