Re: POC, WIP: OR-clause support for indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-11-27 21:03:50
Message-ID: CA+TgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 27, 2023 at 3:02 AM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> On 25/11/2023 08:23, Alexander Korotkov wrote:
> > I think patch certainly gets better in this aspect. One thing I can't
> > understand is why do we use home-grown code for resolving
> > hash-collisions. You can just define custom hash and match functions
> > in HASHCTL. Even if we need to avoid repeated JumbleExpr calls, we
> > still can save pre-calculated hash value into hash entry and use
> > custom hash and match. This doesn't imply us to write our own
> > collision-resolving code.
>
> Thanks, it was an insightful suggestion.
> I implemented it, and the code has become shorter (see attachment).

Neither the code comments nor the commit message really explain the
design idea here. That's unfortunate, principally because it makes
review difficult.

I'm very skeptical about the idea of using JumbleExpr for any part of
this. It seems fairly expensive, and it might produce false matches.
If expensive is OK, then why not just use equal()? If it's not, then
this probably isn't really OK either. But in any case there should be
comments explaining why this strategy was chosen.

The use of op_mergejoinable() seems pretty random to me. Why should we
care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can
transform that to a<any(array[1,2,3,4]) if you want. It might not be a
good idea, but I think it's a legal transformation. The reader
shouldn't be left to guess whether a rule like this was made for
reasons of correctness or for reasons of efficiency or something else.
Looking further, I see that the reason for this is likely that the
operator for the transformation result is constructing using
list_make1(makeString((char *) "=")), but trying to choose an operator
based on the operator name is, I think, pretty clearly unacceptable.
Tom has fired more than one hacker out of an airlock for such
transgressions, and this violation seems less principled than some.
The = operator that got chosen could be entirely unrelated to any
operator in the original, untransformed query. It could be part of no
operator class that was involved in the original query, in a different
schema than the operator in the original query, and owned by a
different user than the one who owned any operator referenced by the
original query. I suspect that's not only incorrect but an exploitable
security vulnerability.

I am extremely dubious about the use of select_common_type() here. Why
not do this only when the types already match exactly? Maybe the
concern is unknown literals, but perhaps that should be handled in
some other way. If you do this kind of thing, you need to justify why
it can't fail or produce wrong answers.

Honestly, it seems very hard to avoid the conclusion that this
transformation is being done at too early a stage. Parse analysis is
not the time to try to do query optimization. I can't really believe
that there's a way to produce a committable patch along these lines.
Ideally, a transformation like this should be done after we know what
plan shape we're using (or considering using), so that we can make
cost-based decisions about whether to transform or not. But at the
very least it should happen somewhere in the planner. There's really
no justification for parse analysis rewriting the SQL that the user
entered.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-11-27 21:09:10 Re: Missing docs on AT TIME ZONE precedence?
Previous Message Nathan Bossart 2023-11-27 21:00:30 Re: locked reads for atomics