Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-10-26 19:47:20
Message-ID: aca9dad7-27ef-4724-8a16-435a45640545@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi! Thank you for your feedback!

On 25.10.2023 22:54, Robert Haas wrote:
> On Sat, Oct 14, 2023 at 6:37 PM Alexander Korotkov<aekorotkov(at)gmail(dot)com> wrote:
>> Regarding the GUC parameter, I don't see we need a limit. It's not
>> yet clear whether a small number or a large number of OR clauses are
>> more favorable for transformation. I propose to have just a boolean
>> enable_or_transformation GUC.
> That's a poor solution. So is the GUC patch currently has
> (or_transform_limit). What you need is a heuristic that figures out
> fairly reliably whether the transformation is going to be better or
> worse. Or else, do the whole thing in some other way that is always
> same-or-better.
>
> In general, adding GUCs makes sense when the user knows something that
> we can't know. For example, shared_buffers makes some sense because,
> even if we discovered how much memory the machine has, we can't know
> how much of it the user wants to devote to PostgreSQL as opposed to
> anything else. And track_io_timing makes sense because we can't know
> whether the user wants to pay the price of gathering that additional
> data. But GUCs are a poor way of handling cases where the real problem
> is that we don't know what code to write. In this case, some queries
> will be better with enable_or_transformation=on, and some will be
> better with enable_or_transformation=off. Since we don't know which
> will work out better, we make the user figure it out and set the GUC,
> possibly differently for each query. That's terrible. It's the query
> optimizer's whole job to figure out which transformations will speed
> up the query. It shouldn't turn around and punt the decision back to
> the user.
>
> Notice that superficially-similar GUCs like enable_seqscan aren't
> really the same thing at all. That's just for developer testing and
> debugging. Nobody expects that you have to adjust that GUC on a
> production system - ever.

I noticed that the costs of expressions are different and it can help to
assess when it is worth leaving the conversion, when not.

With small amounts of "OR" elements, the cost of orexpr is lower than
with "ANY", on the contrary, higher.

postgres=# SET or_transform_limit = 500;
EXPLAIN (analyze)
SELECT oid,relname FROM pg_class
WHERE
  oid = 13779 AND (oid = 2 OR oid = 4 OR oid = 5)
;
SET
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pg_class_oid_index on pg_class  (*cost=0.27..8.30*
rows=1 width=68) (actual time=0.105..0.106 rows=0 loops=1)
   Index Cond: (oid = '13779'::oid)
   Filter: ((oid = '2'::oid) OR (oid = '4'::oid) OR (oid = '5'::oid))
 Planning Time: 0.323 ms
 Execution Time: 0.160 ms

(5 rows)

postgres=# SET or_transform_limit = 0;
EXPLAIN (analyze)
SELECT oid,relname FROM pg_class
WHERE
  oid = 13779 AND (oid = 2 OR oid = 4 OR oid = 5)
;
SET
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using pg_class_oid_index on pg_class  (*cost=0.27..16.86*
rows=1 width=68) (actual time=0.160..0.161 rows=0 loops=1)
   Index Cond: ((oid = ANY (ARRAY['2'::oid, '4'::oid, '5'::oid])) AND
(oid = '13779'::oid))
 Planning Time: 4.515 ms
 Execution Time: 0.313 ms
(4 rows)

Index Scan using pg_class_oid_index on pg_class  (*cost=0.27..2859.42*
rows=414 width=68) (actual time=1.504..34.183 rows=260 loops=1)
   Index Cond: (oid = ANY (ARRAY['1'::oid, '2'::oid, '3'::oid,
'4'::oid, '5'::oid, '6'::oid, '7'::oid,

Bitmap Heap Scan on pg_class  (*cost=43835.00..54202.14* rows=414
width=68) (actual time=39.958..41.293 rows=260 loops=1)
   Recheck Cond: ((oid = '1'::oid) OR (oid = '2'::oid) OR (oid =
'3'::oid) OR (oid = '4'::oid) OR (oid =

I think we could see which value is lower, and if lower with expressions
converted to ANY, then work with it further, otherwise work with the
original "OR" expressions. But we still need to make this conversion to
find out its cost.

In addition, I will definitely have to postpone the transformation of
"OR" to "ANY" at the stage of creating indexes (?) or maybe a little
earlier so that I have to count only the cost of the transformed
expression.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-10-26 19:51:11 Re: visibility of open cursors in pg_stat_activity
Previous Message Bruce Momjian 2023-10-26 19:43:29 Re: Partial aggregates pushdown