Re: POC, WIP: OR-clause support for indexes

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-11-28 11:46:51
Message-ID: cfdc7fef-cf0b-437b-89fa-62ffd6677f7d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28/11/2023 04:03, Robert Haas wrote:
> On Mon, Nov 27, 2023 at 3:02 AM Andrei Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> On 25/11/2023 08:23, Alexander Korotkov wrote:
>>> I think patch certainly gets better in this aspect. One thing I can't
>>> understand is why do we use home-grown code for resolving
>>> hash-collisions. You can just define custom hash and match functions
>>> in HASHCTL. Even if we need to avoid repeated JumbleExpr calls, we
>>> still can save pre-calculated hash value into hash entry and use
>>> custom hash and match. This doesn't imply us to write our own
>>> collision-resolving code.
>>
>> Thanks, it was an insightful suggestion.
>> I implemented it, and the code has become shorter (see attachment).
>
> Neither the code comments nor the commit message really explain the
> design idea here. That's unfortunate, principally because it makes
> review difficult.

Yeah, it is still an issue. We will think about how to improve this; any
suggestions are welcome.

> I'm very skeptical about the idea of using JumbleExpr for any part of
> this. It seems fairly expensive, and it might produce false matches.
> If expensive is OK, then why not just use equal()? If it's not, then
> this probably isn't really OK either. But in any case there should be
> comments explaining why this strategy was chosen.

We used the equal() routine without hashing in earlier versions. Hashing
resolves issues with many different OR clauses. Is it expensive? Maybe,
but we assume this transformation should be applied to simple enough
expressions.

> The use of op_mergejoinable() seems pretty random to me. Why should we
> care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can
> transform that to a<any(array[1,2,3,4]) if you want. It might not be a
> good idea, but I think it's a legal transformation.

You are right. The only reason was to obtain a working patch to
benchmark and look for corner cases. We would rewrite that place but
still live with the equivalence operator.

> The reader shouldn't be left to guess whether a rule like this was made for
> reasons of correctness or for reasons of efficiency or something else.
> Looking further, I see that the reason for this is likely that the
> operator for the transformation result is constructing using
> list_make1(makeString((char *) "=")), but trying to choose an operator
> based on the operator name is, I think, pretty clearly unacceptable.

Yes, it was a big mistake. It is fixed in the new version (I guess).

> I am extremely dubious about the use of select_common_type() here. Why
> not do this only when the types already match exactly? Maybe the
> concern is unknown literals, but perhaps that should be handled in
> some other way. If you do this kind of thing, you need to justify why
> it can't fail or produce wrong answers.

Perhaps. We implemented your approach in the next version. At least we
could see consequences.

> Honestly, it seems very hard to avoid the conclusion that this
> transformation is being done at too early a stage. Parse analysis is
> not the time to try to do query optimization. I can't really believe
> that there's a way to produce a committable patch along these lines.
> Ideally, a transformation like this should be done after we know what
> plan shape we're using (or considering using), so that we can make
> cost-based decisions about whether to transform or not. But at the
> very least it should happen somewhere in the planner. There's really
> no justification for parse analysis rewriting the SQL that the user
> entered.

Here, we assume that array operation is generally better than many ORs.
As a result, it should be more effective to make OR->ANY transformation
in the parser (it is a relatively lightweight operation here) and, as a
second phase, decompose that in the optimizer.
We implemented earlier prototypes in different places of the optimizer,
and I'm convinced that only this approach resolves the issues we found.
Does this approach look weird? Maybe. We can debate it in this thread.

--
regards,
Andrei Lepikhov
Postgres Professional

Attachment Content-Type Size
v13-0001-Transform-OR-clause-to-ANY-expressions.patch text/plain 53.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-11-28 12:06:28 Re: XID formatting and SLRU refactorings
Previous Message Amit Kapila 2023-11-28 11:32:03 Re: logical decoding and replication of sequences, take 2