Re: NOT IN subquery optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: "Li, Zheng" <zhelli(at)amazon(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Richard Guo <riguo(at)pivotal(dot)io>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
Subject: Re: NOT IN subquery optimization
Date: 2019-03-03 04:11:35
Message-ID: 7771.1551586295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On Sun, 3 Mar 2019 at 05:25, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> My initial thought about plugging that admittedly-academic point is
>> to insist that the join operator be both strict and a member of a
>> btree opclass (hash might be OK too; less sure about other index types).

> Why strict? If both inputs are non-NULL, then what additional
> guarantees does strict give us?

Yeah, if we're verifying the inputs are non-null, I think that probably
doesn't matter.

> I implemented a btree opfamily check in my version of the patch. Not
> so sure about hash, can you point me in the direction of a mention of
> how this is guarantees for btree?

https://www.postgresql.org/docs/devel/btree-support-funcs.html
quoth

The comparison function must take two non-null values A and B and
return an int32 value that is < 0, 0, or > 0 when A < B, A = B, or A >
B, respectively. A null result is disallowed: all values of the data
type must be comparable.

(At the code level, this is implicit in the fact that the comparison
function will be called via FunctionCall2Coll or a sibling, and those
all throw an error if the called function returns NULL.)

Now, it doesn't say in so many words that the comparison operators
have to yield results consistent with the comparison support function,
but I think that's pretty obvious ...

For hash, the equivalent constraint is that the hash function has to
work for every possible input value. I suppose it's possible that
the associated equality operator would sometimes return null, but
it's hard to think of a practical reason for doing so.

I've not dug in the code, but I wouldn't be too surprised if
nodeMergejoin.c or nodeHashjoin.c, or the stuff related to hash
grouping or hash aggregation, also contain assumptions about
the equality operators not returning null.

> The list of builtin types that have a hash opfamily but no btree
> opfamily that support NOT IN are not very exciting, so doing the same
> for hash might not be worth the extra code.

Agreed for builtin types, but there might be some extensions out there
where this doesn't hold. It's not terribly hard to imagine a data type
that hasn't got a linear sort order but is amenable to hashing.
(The in-core xid type is an example, actually.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-03-03 04:29:26 Re: WIP: BRIN multi-range indexes
Previous Message David Rowley 2019-03-03 03:53:31 Re: NOT IN subquery optimization