Re: POC, WIP: OR-clause support for indexes

From: "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-09-26 09:21:10
Message-ID: d6e3fc3c-6f1c-2586-3fa2-ef793045047c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm sorry I didn't write for a long time, but I really had a very
difficult month, now I'm fully back to work.

*I was able to implement the patches to the end and moved the
transformation of "OR" expressions to ANY.* I haven't seen a big
difference between them yet, one has a transformation before calculating
selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the other after
(v7.2-Replace-OR-clause-to-ANY.patch). Regression tests are passing, I
don't see any problems with selectivity, nothing has fallen into the
coredump, but I found some incorrect transformations. What is the reason
for these inaccuracies, I have not found, but, to be honest, they look
unusual). Gave the error below.

In the patch, I don't like that I had to drag three libraries from
parsing until I found a way around it.The advantage of this approach
compared to the other ([1]) is that at this stage all possible or
transformations are performed, compared to the patch, where the
transformation was done at the parsing stage. That is, here, for
example, there are such optimizations in the transformation:

I took the common element out of the bracket and the rest is converted
to ANY, while, as noted by Peter Geoghegan, we did not have several
bitmapscans, but only one scan through the array.

postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR
prolang = 13 AND prolang = 3;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..151.66 rows=1 width=68) (actual
time=1.167..1.168 rows=0 loops=1)
   Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid,
'2'::oid, '3'::oid])))
   Rows Removed by Filter: 3302
 Planning Time: 0.146 ms
 Execution Time: 1.191 ms
(5 rows)

*While I was testing, I found some transformations that don't work,
although in my opinion, they should:**
**
**1. First case:*
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 2 AND prolang = 2 OR
prolang = 13 AND prolang = 13;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..180.55 rows=2 width=68) (actual
time=2.959..3.335 rows=89 loops=1)
   Filter: (((prolang = '13'::oid) AND (prolang = '1'::oid)) OR
((prolang = '2'::oid) AND (prolang = '2'::oid)) OR ((prolang =
'13'::oid) AND (prolang = '13'::oid)))
   Rows Removed by Filter: 3213
 Planning Time: 1.278 ms
 Execution Time: 3.486 ms
(5 rows)

Should have left only prolang = '13'::oid:

                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..139.28 rows=1 width=68) (actual
time=2.034..2.034 rows=0 loops=1)
   Filter: ((prolang = '13'::oid ))
   Rows Removed by Filter: 3302
 Planning Time: 0.181 ms
 Execution Time: 2.079 ms
(5 rows)

*2. Also does not work:*
postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual
time=2.422..2.686 rows=89 loops=1)
   Filter: ((prolang = '13'::oid) OR ((prolang = '2'::oid) AND (prolang
= '2'::oid)) OR (prolang = '13'::oid))
   Rows Removed by Filter: 3213
 Planning Time: 1.370 ms
 Execution Time: 2.799 ms
(5 rows)

Should have left:
Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))

*3. Or another:*

explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual
time=2.350..2.566 rows=89 loops=1)
   Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR ((prolang
= '2'::oid) AND (prolang = '2'::oid)))
   Rows Removed by Filter: 3213
 Planning Time: 0.215 ms
 Execution Time: 2.624 ms
(5 rows)

Should have left:
Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))

*Falls into coredump at me:*
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;

explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual
time=2.350..2.566 rows=89 loops=1)
   Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR ((prolang
= '2'::oid) AND (prolang = '2'::oid)))
   Rows Removed by Filter: 3213
 Planning Time: 0.215 ms
 Execution Time: 2.624 ms

(5 rows)

I remind that initially the task was to find an opportunity to optimize
the case of processing a large number of "or" expressions to optimize
memory consumption. The FlameGraph for executing 50,000 "or"
expressionshas grown 1.4Gb and remains in this state until exiting the
psql session (flamegraph1.png) and it sagged a lot in execution time. If
this case is converted to ANY, the query is executed much faster and
memory is optimized (flamegraph2.png). It may be necessary to use this
approach if there is no support for the framework to process ANY, IN
expressions.

Peter Geoghegan also noticed some development of this patch in terms of
preparing some transformations to optimize the query at the stage of its
execution [0].

[0]
https://www.postgresql.org/message-id/CAH2-Wz%3D9N_4%2BEyhtyFqYQRx4OgVbP%2B1aoYU2JQPVogCir61ZEQ%40mail.gmail.com

[1]
https://www.postgresql.org/message-id/attachment/149105/v7-Replace-OR-clause-to-ANY-expressions.patch

Attachment Content-Type Size
flamegraph1.png image/png 106.0 KB
flamegraph2.png image/png 159.5 KB
v7.1-Replace-OR-clause-to-ANY.patch text/x-patch 9.0 KB
v7.2-Replace-OR-clause-to-ANY.patch text/x-patch 10.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message a.rybakina 2023-09-26 09:39:02 Re: POC, WIP: OR-clause support for indexes
Previous Message a.rybakina 2023-09-26 09:13:29 Re: POC, WIP: OR-clause support for indexes