Re: POC, WIP: OR-clause support for indexes

From: "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-09-26 09:39:02
Message-ID: 055cb3c5-38a5-1c6e-cf97-ebc9b56dae56@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for the duplicates, I received a letter that my letter did not
reach the addressee, I thought the design was incorrect.

On 26.09.2023 12:21, a.rybakina wrote:
>
> I'm sorry I didn't write for a long time, but I really had a very
> difficult month, now I'm fully back to work.
>
> *I was able to implement the patches to the end and moved the
> transformation of "OR" expressions to ANY.* I haven't seen a big
> difference between them yet, one has a transformation before
> calculating selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the
> other after (v7.2-Replace-OR-clause-to-ANY.patch). Regression tests
> are passing, I don't see any problems with selectivity, nothing has
> fallen into the coredump, but I found some incorrect transformations.
> What is the reason for these inaccuracies, I have not found, but, to
> be honest, they look unusual). Gave the error below.
>
> In the patch, I don't like that I had to drag three libraries from
> parsing until I found a way around it.The advantage of this approach
> compared to the other ([1]) is that at this stage all possible or
> transformations are performed, compared to the patch, where the
> transformation was done at the parsing stage. That is, here, for
> example, there are such optimizations in the transformation:
>
>
> I took the common element out of the bracket and the rest is converted
> to ANY, while, as noted by Peter Geoghegan, we did not have several
> bitmapscans, but only one scan through the array.
>
> postgres=# explain analyze SELECT p1.oid, p1.proname
> FROM pg_proc as p1
> WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR
> prolang = 13 AND prolang = 3;
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on pg_proc p1  (cost=0.00..151.66 rows=1 width=68) (actual
> time=1.167..1.168 rows=0 loops=1)
>    Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid,
> '2'::oid, '3'::oid])))
>    Rows Removed by Filter: 3302
>  Planning Time: 0.146 ms
>  Execution Time: 1.191 ms
> (5 rows)
>
> *While I was testing, I found some transformations that don't work,
> although in my opinion, they should:**
> **
> **1. First case:*
> explain analyze SELECT p1.oid, p1.proname
> FROM pg_proc as p1
> WHERE prolang = 13 AND prolang=1 OR prolang = 2 AND prolang = 2 OR
> prolang = 13 AND prolang = 13;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on pg_proc p1  (cost=0.00..180.55 rows=2 width=68) (actual
> time=2.959..3.335 rows=89 loops=1)
>    Filter: (((prolang = '13'::oid) AND (prolang = '1'::oid)) OR
> ((prolang = '2'::oid) AND (prolang = '2'::oid)) OR ((prolang =
> '13'::oid) AND (prolang = '13'::oid)))
>    Rows Removed by Filter: 3213
>  Planning Time: 1.278 ms
>  Execution Time: 3.486 ms
> (5 rows)
>
> Should have left only prolang = '13'::oid:
>
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on pg_proc p1  (cost=0.00..139.28 rows=1 width=68) (actual
> time=2.034..2.034 rows=0 loops=1)
>    Filter: ((prolang = '13'::oid ))
>    Rows Removed by Filter: 3302
>  Planning Time: 0.181 ms
>  Execution Time: 2.079 ms
> (5 rows)
>
> *2. Also does not work:*
> postgres=# explain analyze SELECT p1.oid, p1.proname
> FROM pg_proc as p1
> WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual
> time=2.422..2.686 rows=89 loops=1)
>    Filter: ((prolang = '13'::oid) OR ((prolang = '2'::oid) AND
> (prolang = '2'::oid)) OR (prolang = '13'::oid))
>    Rows Removed by Filter: 3213
>  Planning Time: 1.370 ms
>  Execution Time: 2.799 ms
> (5 rows)
>
> Should have left:
> Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))
>
> *3. Or another:*
>
> explain analyze SELECT p1.oid, p1.proname
> FROM pg_proc as p1
> WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual
> time=2.350..2.566 rows=89 loops=1)
>    Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR
> ((prolang = '2'::oid) AND (prolang = '2'::oid)))
>    Rows Removed by Filter: 3213
>  Planning Time: 0.215 ms
>  Execution Time: 2.624 ms
> (5 rows)
>
> Should have left:
> Filter: ((prolang = '13'::oid) OR (prolang = '2'::oid))
>
>
> *Falls into coredump at me:*
> explain analyze SELECT p1.oid, p1.proname
> FROM pg_proc as p1
> WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
>
> explain analyze SELECT p1.oid, p1.proname
> FROM pg_proc as p1
> WHERE prolang = 13 OR prolang=13 OR prolang = 2 AND prolang = 2;
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Seq Scan on pg_proc p1  (cost=0.00..164.04 rows=176 width=68) (actual
> time=2.350..2.566 rows=89 loops=1)
>    Filter: ((prolang = '13'::oid) OR (prolang = '13'::oid) OR
> ((prolang = '2'::oid) AND (prolang = '2'::oid)))
>    Rows Removed by Filter: 3213
>  Planning Time: 0.215 ms
>  Execution Time: 2.624 ms
>
> (5 rows)
>
>
> I remind that initially the task was to find an opportunity to
> optimize the case of processing a large number of "or" expressions to
> optimize memory consumption. The FlameGraph for executing 50,000 "or"
> expressionshas grown 1.4Gb and remains in this state until exiting the
> psql session (flamegraph1.png) and it sagged a lot in execution time.
> If this case is converted to ANY, the query is executed much faster
> and memory is optimized (flamegraph2.png). It may be necessary to use
> this approach if there is no support for the framework to process ANY,
> IN expressions.
>
>
> Peter Geoghegan also noticed some development of this patch in terms
> of preparing some transformations to optimize the query at the stage
> of its execution [0].
>
> [0]
> https://www.postgresql.org/message-id/CAH2-Wz%3D9N_4%2BEyhtyFqYQRx4OgVbP%2B1aoYU2JQPVogCir61ZEQ%40mail.gmail.com
>
> [1]
> https://www.postgresql.org/message-id/attachment/149105/v7-Replace-OR-clause-to-ANY-expressions.patch
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-09-26 10:02:27 Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)
Previous Message a.rybakina 2023-09-26 09:21:10 Re: POC, WIP: OR-clause support for indexes