Re: missing estimation for coalesce function

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing estimation for coalesce function
Date: 2019-11-28 07:45:33
Message-ID: CAFj8pRDOMQ6zjFNqQWr42MR=qO8H=3P_ptjNX=r7t+nzFLYc=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

čt 28. 11. 2019 v 4:48 odesílatel Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
napsal:

> Hi
>
> čt 28. 11. 2019 v 3:56 odesílatel David Fetter <david(at)fetter(dot)org> napsal:
>
>> On Wed, Nov 27, 2019 at 08:47:56AM +0100, Pavel Stehule wrote:
>> > Hi
>> >
>> > I have a report from my customer about migration his application from
>> > Oracle to Postgres.
>> >
>> > The most significant issue was missing correct estimation for coalesce
>> > function. He had to rewrite coalesce(var, X) = X to "var IS NULL or var
>> =
>> > X". Then the result was very satisfactory.
>> >
>> > Example:
>> >
>> > create table xxx(a int);
>> > insert into xxx select null from generate_series(1,10000);
>> > insert into xxx select 1 from generate_series(1,1000);
>> > insert into xxx select 0 from generate_series(1,1000);
>> > analyze xxx;
>> >
>> > postgres=# explain analyze select * from xxx where coalesce(a, 0) = 0;
>> > QUERY PLAN
>> >
>> >
>> ----------------------------------------------------------------------------------------------------
>> > Seq Scan on xxx (cost=0.00..194.00 rows=60 width=4) (actual
>> > time=0.041..4.276 rows=11000 loops=1)
>> > Filter: (COALESCE(a, 0) = 0)
>> > Rows Removed by Filter: 1000
>> > Planning Time: 0.099 ms
>> > Execution Time: 5.412 ms
>> > (5 rows)
>> >
>> > postgres=# explain analyze select * from xxx where a is null or a = 0;
>> > QUERY PLAN
>> >
>> >
>> -------------------------------------------------------------------------------------------------------
>> > Seq Scan on xxx (cost=0.00..194.00 rows=10167 width=4) (actual
>> > time=0.052..5.891 rows=11000 loops=1)
>> > Filter: ((a IS NULL) OR (a = 0))
>> > Rows Removed by Filter: 1000
>> > Planning Time: 0.136 ms
>> > Execution Time: 7.522 ms
>> > (5 rows)
>> >
>> > I think so pattern coalesce(var, X) = X is very common so can be very
>> > interesting to support it better.
>>
>> Better support sounds great!
>>
>> How specifically might this be better supported? On this relatively
>> short table, I see planning times considerably longer, I assume
>> because they need to take a function call into account, and execution
>> times longer but not all that much longer. I tried with 3 million
>> rows, and got the representative samples below:
>>
>> shackle(at)[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT *
>> FROM xxx WHERE COALESCE(a, 0)=0;
>> QUERY PLAN
>>
>>
>> ══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
>> Gather (cost=1000.00..30391.00 rows=15000 width=4) (actual
>> time=1.315..346.406 rows=999772 loops=1)
>> Workers Planned: 2
>> Workers Launched: 2
>> -> Parallel Seq Scan on xxx (cost=0.00..27891.00 rows=6250 width=4)
>> (actual time=0.029..216.419 rows=333257 loops=3)
>> Filter: (COALESCE(a, 0) = 0)
>> Rows Removed by Filter: 666743
>> Planning Time: 0.204 ms
>> Execution Time: 389.307 ms
>> (8 rows)
>>
>> Time: 391.394 ms
>>
>> shackle(at)[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT *
>> FROM xxx WHERE a IS NULL OR a = 0;
>> QUERY PLAN
>>
>>
>> ═════════════════════════════════════════════════════════════════════════════════════════════════════════════
>> Seq Scan on xxx (cost=0.00..49766.00 rows=995700 width=4) (actual
>> time=0.043..524.401 rows=999772 loops=1)
>> Filter: ((a IS NULL) OR (a = 0))
>> Rows Removed by Filter: 2000228
>> Planning Time: 0.106 ms
>> Execution Time: 560.593 ms
>> (5 rows)
>>
>> Time: 561.186 ms
>>
>
> I didn't thing about rewriting. The correct solution should be via own
> selectivity function. Now for coalesce is used 5% estimation (like for
> other functions). Probably it should not be hard code because coalesce is a
> node already. But it is part of code that I never modified.
>

but support functions can be used

https://www.cybertec-postgresql.com/en/optimizer-support-functions/

postgres=# create table test(id integer);
CREATE TABLE
postgres=# insert into test select generate_series(1,100000);
INSERT 0 100000
postgres=# insert into test select null from generate_series(1,1000);
INSERT 0 1000
postgres=# analyze test;
ANALYZE
postgres=# create index on test(id);
CREATE INDEX
postgres=# explain analyze select * from test where coalesce(id, 10) = 10;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞═══════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Seq Scan on test (cost=0.00..1708.50 rows=505 width=4) (actual
time=0.062..18.370 rows=1001 loops=1) │
│ Filter: (COALESCE(id, 10) = 10)

│ Rows Removed by Filter: 99999

│ Planning Time: 37.212 ms

│ Execution Time: 18.479 ms

└───────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

postgres=# explain analyze select * from test where id is null or id = 10;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Bitmap Heap Scan on test (cost=24.30..482.35 rows=964 width=4) (actual
time=0.197..0.334 rows=1001 loops=1) │
│ Recheck Cond: ((id IS NULL) OR (id = 10))

│ Heap Blocks: exact=5

│ -> BitmapOr (cost=24.30..24.30 rows=964 width=0) (actual
time=0.189..0.189 rows=0 loops=1) │
│ -> Bitmap Index Scan on test_id_idx (cost=0.00..19.52 rows=963
width=0) (actual time=0.170..0.170 rows=1000 loops=1) │
│ Index Cond: (id IS NULL)

│ -> Bitmap Index Scan on test_id_idx (cost=0.00..4.30 rows=1
width=0) (actual time=0.019..0.019 rows=1 loops=1) │
│ Index Cond: (id = 10)

│ Planning Time: 0.090 ms

│ Execution Time: 0.413 ms

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

There can be strong benefit from replacement if indexes are used.

> Pavel
>
>>
>> Best,
>> David.
>> --
>> David Fetter <david(at)fetter(dot)org> http://fetter.org/
>> Phone: +1 415 235 3778
>>
>> Remember to vote!
>> Consider donating to Postgres: http://www.postgresql.org/about/donate
>>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-11-28 07:52:08 Re: fe-utils - share query cancellation code
Previous Message Amit Langote 2019-11-28 07:43:20 Re: pgbench -i progress output on terminal