Re: NOT IN subquery optimization

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Li, Zheng" <zhelli(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <riguo(at)pivotal(dot)io>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
Subject: Re: NOT IN subquery optimization
Date: 2019-03-03 03:53:31
Message-ID: CAKJS1f9GndGjLQE5yqkwXXxHBei2=uFQnHa40q5iQA=pC9U4bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 3 Mar 2019 at 05:25, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hmm ... thinking about the strictness angle some more: what we really
> need to optimize NOT IN, IIUC, is an assumption that the join operator
> will never return NULL. While not having NULL inputs is certainly a
> *necessary* condition for that (assuming a strict operator) it's not a
> *sufficient* condition. Any Postgres function/operator is capable
> of returning NULL whenever it feels like it. So checking strictness
> does not lead to a mathematically correct optimization.

That's something I didn't think of.

> My initial thought about plugging that admittedly-academic point is
> to insist that the join operator be both strict and a member of a
> btree opclass (hash might be OK too; less sure about other index types).
> The system already contains assumptions that btree comparators never
> return NULL. I doubt that this costs us any real-world applicability,
> because if the join operator can neither merge nor hash, we're screwed
> anyway for finding a join plan that's better than nested-loop.

Why strict? If both inputs are non-NULL, then what additional
guarantees does strict give us?

I implemented a btree opfamily check in my version of the patch. Not
so sure about hash, can you point me in the direction of a mention of
how this is guarantees for btree?

The attached v1.2 does this adds a regression test using the LINE
type. This has an operator named '=', but no btree opfamily. A few
other types are in this boat too, per:

select typname from pg_type t where not exists(select 1 from pg_amop
where amoplefttype = t.oid and amopmethod=403) and exists (select 1
from pg_operator where oprleft = t.oid and oprname = '=');

The list of builtin types that have a hash opfamily but no btree
opfamily that support NOT IN are not very exciting, so doing the same
for hash might not be worth the extra code.

select typname from pg_type t where exists(select 1 from pg_amop where
amoplefttype = t.oid and amopmethod=405) and exists (select 1 from
pg_operator where oprleft = t.oid and oprname = '=') and not
exists(select 1 from pg_amop where amoplefttype = t.oid and
amopmethod=403);
typname
---------
xid
cid
aclitem
(3 rows)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
not_in_anti_join_v1.2.patch application/octet-stream 50.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-03 04:11:35 Re: NOT IN subquery optimization
Previous Message Tomas Vondra 2019-03-03 02:12:51 Re: Online verification of checksums