Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Date: 2021-05-08 00:38:33
Message-ID: CAApHDvpp8MT9gT=ZUfh2o4sMKLdXkQV1jBjos3sV-NQhwYoNfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 8 May 2021 at 09:15, James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Sat, Apr 24, 2021 at 6:25 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > I'm a bit undecided if it's safe to set the opfuncid to the negator
> > function. If anything were to set that again based on the opno then
> > it would likely set it to the wrong thing. We can't go changing the
> > opno either because EXPLAIN would display the wrong thing.
>
> I don't personally see a reason why this is a problem. But I also
> don't know that I have enough knowledge of the codebase to say that
> definitively.

The reason for my concern is that if the opfuncid is set to
InvalidOid, set_sa_opfuncid() always sets the ScalarArrayOpExpr's
opfuncid to get_opcode(opexpr->opno) . I'm effectively setting the
opfuncid to get_opcode(get_negator(opexpr->opno)), if anything were to
reset the ScalarArrayOpExpr's opfuncid to InvalidOid, then
set_sa_opfuncid() would repopulate it with the wrong value.

Maybe the solution there is to teach set_sa_opfuncid() about our
hashing NOT IN trick and have it check if (!opexpr->useOr &&
OidIsValid(opexpr->hashfuncid)) and if that's true then do
opexpr->opfuncid = get_opcode(get_negator(opexpr->opno)). Then we
could just not bothing setting opfuncid in
convert_saop_to_hashed_saop_walker().

> > Anyway, I've attached what I ended up with after spending a few hours
> > looking at this.
>
> Overall I like this approach.
>
> One thing I think we could clean up:
>
> + bool useOr; /* use OR or AND semantics? */
> ...
> + /* useOr == true means an IN clause, useOr == false is NOT IN */
>
> I'm wondering if the intersection of these two lines implies that
> useOr isn't quite the right name here. Perhaps something like
> "negated"?

I'm not sure I want to go changing that. The whole IN() / NOT IN()
behaviour regarding NULLs all seems pretty weird until you mentally
replace a IN (1,2,3) with a = 1 OR a = 2 OR a = 3. And for the a NOT
IN(1,2,3) case, a <> 1 AND a <> 2 AND a <> 3. People can make a bit
more sense of the weirdness of NULLs with NOT IN when they mentally
convert their expression like that. I think having that in code is
useful too. Any optimisations that are added must match those
semantics.

> The other thing I got to thinking about was = ALL. It doesn't get
> turned into a hash op because the negator of = isn't hashable. I think
> it's correct that that's the determining factor, because I can't
> imagine what it would mean to hash <>. But...I wanted to confirm I
> wasn't missing something. We don't have explicit tests for that case,
> but I'm not sure it's necessary either.

It's important to think of other cases, I just don't think there's any
need to do anything for that one. Remember that we have the
restriction of requiring a set of Consts, so for that case to be met,
someone would have to write something like: col =
ALL('{1,1,1,1,1,1,1,1}'::int[]); I think if anyone comes along
complaining that a query containing that is not as fast as they'd like
then we might tell them that they should just use: col = 1. A sanity
checkup might not go amiss either.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-05-08 01:16:49 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Previous Message Noah Misch 2021-05-08 00:14:18 Re: Anti-critical-section assertion failure in mcxt.c reached by walsender