Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-10-29 16:41:06
Message-ID: cc74aa7d-0911-4873-87d2-9405ece0ec32@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On 27.10.2023 00:04, Peter Geoghegan wrote:
> On Thu, Oct 26, 2023 at 12:59 PM Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> Alexander's example seems to show that it's not that simple. If I'm
>> reading his example correctly, with things like aid = 1, the
>> transformation usually wins even if the number of things in the OR
>> expression is large, but with things like aid + 1 * bid = 1, the
>> transformation seems to lose at least with larger numbers of items. So
>> it's not JUST the number of OR elements but also what they contain,
>> unless I'm misunderstanding his point.
> Alexander said "Generally, I don't see why ANY could be executed
> slower than the equivalent OR clause". I understood that this was his
> way of expressing the following idea:
>
> "In principle, there is no reason to expect execution of ANY() to be
> slower than execution of an equivalent OR clause (except for
> noise-level differences). While it might not actually look that way
> for every single type of plan you can imagine right now, that doesn't
> argue for making a cost-based decision. It actually argues for fixing
> the underlying issue, which can't possibly be due to some kind of
> fundamental advantage enjoyed by expression evaluation with ORs".
>
> This is also what I think of all this.
>
> Alexander's partial index example had this quality to it. Obviously,
> the planner *could* be taught to do the right thing with such a case,
> with a little more work. The fact that it doesn't right now is
> definitely a problem, and should probably be treated as a blocker for
> this patch. But that doesn't really argue against the general idea
> behind the patch -- it just argues for fixing that one problem.
>
> There may also be a separate problem that comes from the added planner
> cycles required to do the transformation -- particularly in extreme or
> adversarial cases. We should worry about that, too. But, again, it
> doesn't change the basic fact, which is that having a
> standard/normalized representation of OR lists/DNF transformation is
> extremely useful in general, and *shouldn't* result in any real
> slowdowns at execution time if done well.
I think it would be more correct to finalize the current approach to
converting "OR" expressions to "ANY", since quite a few problems related
to this patch have already been found here, I think you can solve them
first, and then you can move on.

>> To me, this sort of output suggests that perhaps the transformation is
>> being done in the wrong place. I expect that we have to decide whether
>> to convert from OR to = ANY(...) at a very early stage of the planner,
>> before we have any idea what the selected path will ultimately be. But
>> this output suggests that we want the answer to depend on which kind
>> of path is going to be faster, which would seem to argue for doing
>> this sort of transformation as part of path generation for only those
>> paths that will benefit from it, rather than during earlier phases of
>> expression processing/simplification.
> I don't think that that's the right direction. They're semantically
> equivalent things. But a SAOP-based plan can be fundamentally better,
> since SAOPs enable passing down useful context to index AMs (at least
> nbtree). And because we can use a hash table for SAOP expression
> evaluation. It's a higher level, standardized, well optimized way of
> expressing exactly the same concept.
>
> I can come up with a case that'll be orders of magnitude more
> efficient with this patch, despite the transformation process only
> affecting a small OR list of 3 or 5 elements -- a 100x reduction in
> heap page accesses is quite possible. This is particularly likely to
> come up if you assume that the nbtree patch that I'm currently working
> on is also available. In general, I think that we totally over-rely on
> bitmap index scans, especially BitmapOrs.
>
>
Regarding the application of the transformation at an early stage, the
patch is almost ready, except for solving cases related to queries that
work slower. I haven't figured out how to exclude such requests without
comparing the cost or parameter by the number of OR elements yet. The
simplest option is not to process Expr types (already mentioned earlier)
in the queries that Alexander gave as an example, but as I already said,
I don't like this approach very much.

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-10-29 16:57:38 Re: pg_dump not dumping the run_as_owner setting from version 16?
Previous Message Dean Rasheed 2023-10-29 16:39:31 Re: Infinite Interval