Re: POC, WIP: OR-clause support for indexes

From: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-11-30 08:00:28
Message-ID: 59e67a40-95a8-4d74-ae4e-027ea0f59084@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

>
>> Honestly, it seems very hard to avoid the conclusion that this
>> transformation is being done at too early a stage. Parse analysis is
>> not the time to try to do query optimization. I can't really believe
>> that there's a way to produce a committable patch along these lines.
>> Ideally, a transformation like this should be done after we know what
>> plan shape we're using (or considering using), so that we can make
>> cost-based decisions about whether to transform or not. But at the
>> very least it should happen somewhere in the planner. There's really
>> no justification for parse analysis rewriting the SQL that the user
>> entered.
>
> Here, we assume that array operation is generally better than many ORs.
> As a result, it should be more effective to make OR->ANY
> transformation in the parser (it is a relatively lightweight operation
> here) and, as a second phase, decompose that in the optimizer.
> We implemented earlier prototypes in different places of the
> optimizer, and I'm convinced that only this approach resolves the
> issues we found.
> Does this approach look weird? Maybe. We can debate it in this thread.

I think this is incorrect, and the example of A. Korotkov confirms this.
If we perform the conversion at the parsing stage, we will skip the more
important conversion using OR expressions. I'll show you in the example
below.

First of all, I will describe my idea to combine two approaches to
obtaining plans with OR to ANY transformation and ANY to OR
transformation. I think they are both good, and we can't work with just
one of them, we should consider both the option of OR expressions, and
with ANY.

I did this by creating a RelOptInfo with which has references from the
original RelOptInfo, for which conversion is possible either from
ANY->OR, or vice versa. After obtaining the necessary transformation, I
started the procedure for obtaining the seq and index paths for both
relations and then calculated their cost. The relation with the lowest
cost is considered the best.

I'm not sure if this is the best approach, but it's less complicated.

I noticed that I got a lower cost for not the best plan, but I think
this corresponds to another topic related to the wrong estimate calculation.

1. The first patch is a mixture of the original patch (when we perform
the conversion of OR to ANY at the parsing stage), and when we perform
the conversion at the index creation stage with the conversion to an OR
expression. We can see that the query proposed by A.Korotkov did not
have the best plan with ANY expression at all, and even despite
receiving a query with OR expressions, we cannot get anything better
than SeqScan, due to the lack of effective logical transformations that
would have been performed if we had left the OR expressions.

So, I got query plans using enable_or_transformation if it is enabled:

postgres=# create table test as (select (random()*10)::int x,
(random()*1000) y
from generate_series(1,1000000) i);
create index test_x_1_y on test (y) where x = 1;
create index test_x_2_y on test (y) where x = 2;
vacuum analyze test;
SELECT 1000000
CREATE INDEX
CREATE INDEX
VACUUM
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
WARNING:  cost with original approach: - 20440.000000
WARNING:  cost with OR to ANY applied transfomation: - 15440.000000
                                QUERY PLAN
--------------------------------------------------------------------------
 Gather  (cost=1000.00..12690.10 rows=1 width=12)
   Workers Planned: 2
   ->  Parallel Seq Scan on test  (cost=0.00..11690.00 rows=1 width=12)
         Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision))
(4 rows)

and if it is off:

postgres=# set enable_or_transformation =off;
SET
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
   Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y =
'100'::double precision) AND (x = 2)))
   ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
         ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1
width=0)
               Index Cond: (y = '100'::double precision)
         ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1
width=0)
               Index Cond: (y = '100'::double precision)
(7 rows)

2. The second patch is my patch version when I moved the OR
transformation in the s index formation stage:

So, I got the best query plan despite the possible OR to ANY transformation:

postgres=# create table test as (select (random()*10)::int x,
(random()*1000) y
from generate_series(1,1000000) i);
create index test_x_1_y on test (y) where x = 1;
create index test_x_2_y on test (y) where x = 2;
vacuum analyze test;
SELECT 1000000
CREATE INDEX
CREATE INDEX
VACUUM
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
WARNING:  cost with original approach: - 12.618000
WARNING:  cost with OR to ANY applied transfomation: - 15440.000000
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
   Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y =
'100'::double precision) AND (x = 2)))
   ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
         ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1
width=0)
               Index Cond: (y = '100'::double precision)
         ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1
width=0)
               Index Cond: (y = '100'::double precision)
(7 rows)

--
Regards,
Alena Rybakina
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2023-11-30 08:05:31 Re: POC, WIP: OR-clause support for indexes
Previous Message Peter Smith 2023-11-30 07:39:32 Re: GUC names in messages