From: | "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru> |
---|---|
To: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Subject: | Re: POC, WIP: OR-clause support for indexes |
Date: | 2023-09-26 09:13:29 |
Message-ID: | 279cfb85-17ee-11b9-4074-cf002bb0f46c@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I'm sorry I didn't write for a long time, but I really had a very
difficult month, now I'm fully back to work.
*I was able to implement the patches to the end and moved the
transformation of "OR" expressions to ANY.* I haven't seen a big
difference between them yet, one has a transformation before calculating
selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the other after
(v7.2-Replace-OR-clause-to-ANY.patch). Regression tests are passing, I
don't see any problems with selectivity, nothing has fallen into the
coredump, but I found some incorrect transformations. What is the reason
for these inaccuracies, I have not found, but, to be honest, they look
unusual). Gave the error below.
In the patch, I don't like that I had to drag three libraries from
parsing until I found a way around it.The advantage of this approach
compared to the other (v7.0-Replace-OR-clause-to-ANY.patch) is that at
this stage all possible or transformations are performed, compared to
the patch, where the transformation was done at the parsing stage. That
is, here, for example, there are such optimizations in the transformation:
I took the common element out of the bracket and the rest is converted
to ANY, while, as noted by Peter Geoghegan, we did not have several
bitmapscans, but only one scan through the array.
postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR
prolang = 13 AND prolang = 3;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on pg_proc p1 (cost=0.00..151.66 rows=1 width=68) (actual
time=1.167..1.168 rows=0 loops=1)
Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid,
'2'::oid, '3'::oid])))
Rows Removed by Filter: 3302
Planning Time: 0.146 ms
Execution Time: 1.191 ms
(5 rows)
*While I was testing, I found some transformations that don't work,
although in my opinion, they should:**
**
**1. First case:*
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 2 AND prolang = 2 OR
prolang = 13 AND prolang = 13;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on pg_proc p1 (cost=0.00..180.55 rows=2 width=68) (actual
time=2.959..3.335 rows=89 loops=1)
Filter: (((prolang = '13'::oid) AND (prolang = '1'::oid)) OR
((prolang = '2'::oid) AND (prolang = '2'::oid)) OR ((prolang =
'13'::oid) AND (prolang = '13'::oid)))
Rows Removed by Filter: 3213
Planning Time: 1.278 ms
Execution Time: 3.486 ms
(5 rows)
Should have left only prolang = '13'::oid:
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on pg_proc p1 (cost=0.00..139.28 rows=1 width=68) (actual
time=2.034..2.034 rows=0 loops=1)
Filter: ((prolang = '13'::oid ))
Rows Removed by Filter: 3302
Planning Time: 0.181 ms
Execution Time: 2.079 ms
(5 rows)
*2. Also does not work:*
postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on pg_proc p1 (cost=0.00..164.04 rows=176 width=68) (actual
time=2.422..2.686 rows=89 loops=1)
Filter: ((prolang = '13'::oid) OR ((prolang = '2'::oid) AND (prolang
= '2'::oid)) OR (prolang = '13'::oid))
Rows Removed by Filter: 3213
Planning Time: 1.370 ms
Execution Time: 2.799 ms
(5 rows)
Should have left:
Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))
*3. Or another:*
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on pg_proc p1 (cost=0.00..164.04 rows=176 width=68) (actual
time=2.350..2.566 rows=89 loops=1)
Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR ((prolang
= '2'::oid) AND (prolang = '2'::oid)))
Rows Removed by Filter: 3213
Planning Time: 0.215 ms
Execution Time: 2.624 ms
(5 rows)
Should have left:
Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))
*Falls into coredump at me:*
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on pg_proc p1 (cost=0.00..164.04 rows=176 width=68) (actual
time=2.350..2.566 rows=89 loops=1)
Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR ((prolang
= '2'::oid) AND (prolang = '2'::oid)))
Rows Removed by Filter: 3213
Planning Time: 0.215 ms
Execution Time: 2.624 ms
(5 rows)
I remind that initially the task was to find an opportunity to optimize
the case of processing a large number of "or" expressions to optimize
memory consumption. The FlameGraph for executing 50,000 "or"
expressionshas grown 1.4Gb and remains in this state until exiting the
psql session (flamegraph1.png) and it sagged a lot in execution time. If
this case is converted to ANY, the query is executed much faster and
memory is optimized (flamegraph2.png). It may be necessary to use this
approach if there is no support for the framework to process ANY, IN
expressions.
Peter Geoghegan also noticed some development of this patch in terms of
preparing some transformations to optimize the query at the stage of its
execution [0].
Attachment | Content-Type | Size |
---|---|---|
flamegraph1.png | image/png | 106.0 KB |
flamegraph2.png | image/png | 159.5 KB |
v7.0-Replace-OR-clause-to-ANY.patch | text/x-patch | 32.8 KB |
v7.1-Replace-OR-clause-to-ANY.patch | text/x-patch | 9.0 KB |
v7.2-Replace-OR-clause-to-ANY.patch | text/x-patch | 10.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | a.rybakina | 2023-09-26 09:21:10 | Re: POC, WIP: OR-clause support for indexes |
Previous Message | a.rybakina | 2023-09-26 09:08:52 | Re: POC, WIP: OR-clause support for indexes |