Re: POC, WIP: OR-clause support for indexes

From: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-02-13 10:03:20
Message-ID: e3338e82-a28d-4631-9eec-b9c0984b32d5@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13/2/2024 07:00, jian he wrote:
> + newa = makeNode(ArrayExpr);
> + /* array_collid will be set by parse_collate.c */
> + newa->element_typeid = scalar_type;
> + newa->array_typeid = array_type;
> + newa->multidims = false;
> + newa->elements = aexprs;
> + newa->location = -1;
>
> I am confused by the comments `array_collid will be set by
> parse_collate.c`, can you further explain it?
I wonder if the second paragraph of comments on commit b310b6e will be
enough to dive into details.

> if OR expression right arm is not plain Const, but with collation
> specification, eg.
> `where a = 'a' collate "C" or a = 'b' collate "C";`
>
> then the rightop is not Const, it will be CollateExpr, it will not be
> used in transformation.
Yes, it is done for simplicity right now. I'm not sure about corner
cases of merging such expressions.

>
> set enable_or_transformation to on;
> explain(timing off, analyze, costs off)
> select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
> = 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
> 35.376 ms
>
> The time is the last result of the 10 iterations.
The reason here - parallel workers.
If you see into the plan you will find parallel workers without
optimization and absence of them in the case of optimization:

Gather (cost=1000.00..28685.37 rows=87037 width=12)
(actual rows=90363 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test
Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5)
OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))

Seq Scan on test (cost=0.02..20440.02 rows=90600 width=12)
(actual rows=90363 loops=1)
Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))

Having 90600 tuples returned we estimate it into 87000 (less precisely)
without transformation and 90363 (more precisely) with the transformation.
But if you play with parallel_tuple_cost and parallel_setup_cost, you
will end up having these parallel workers:

Gather (cost=0.12..11691.03 rows=90600 width=12)
(actual rows=90363 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on test
Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
Rows Removed by Filter: 303212

And some profit about 25%, on my laptop.
I'm not sure about the origins of such behavior, but it seems to be an
issue of parallel workers, not this specific optimization.

--
regards,
Andrei Lepikhov
Postgres Professional

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Lepikhov 2024-02-13 10:43:21 Re: POC, WIP: OR-clause support for indexes
Previous Message shveta malik 2024-02-13 09:59:45 Re: Synchronizing slots from primary to standby