Re: missing estimation for coalesce function

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing estimation for coalesce function
Date: 2019-11-28 03:48:54
Message-ID: CAFj8pRBrYRrmy4U-TEM4Ea7GYqQ90kN-5Jyyxc-gS_JJWyimZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

čt 28. 11. 2019 v 3:56 odesílatel David Fetter <david(at)fetter(dot)org> napsal:

> On Wed, Nov 27, 2019 at 08:47:56AM +0100, Pavel Stehule wrote:
> > Hi
> >
> > I have a report from my customer about migration his application from
> > Oracle to Postgres.
> >
> > The most significant issue was missing correct estimation for coalesce
> > function. He had to rewrite coalesce(var, X) = X to "var IS NULL or var =
> > X". Then the result was very satisfactory.
> >
> > Example:
> >
> > create table xxx(a int);
> > insert into xxx select null from generate_series(1,10000);
> > insert into xxx select 1 from generate_series(1,1000);
> > insert into xxx select 0 from generate_series(1,1000);
> > analyze xxx;
> >
> > postgres=# explain analyze select * from xxx where coalesce(a, 0) = 0;
> > QUERY PLAN
> >
> >
> ----------------------------------------------------------------------------------------------------
> > Seq Scan on xxx (cost=0.00..194.00 rows=60 width=4) (actual
> > time=0.041..4.276 rows=11000 loops=1)
> > Filter: (COALESCE(a, 0) = 0)
> > Rows Removed by Filter: 1000
> > Planning Time: 0.099 ms
> > Execution Time: 5.412 ms
> > (5 rows)
> >
> > postgres=# explain analyze select * from xxx where a is null or a = 0;
> > QUERY PLAN
> >
> >
> -------------------------------------------------------------------------------------------------------
> > Seq Scan on xxx (cost=0.00..194.00 rows=10167 width=4) (actual
> > time=0.052..5.891 rows=11000 loops=1)
> > Filter: ((a IS NULL) OR (a = 0))
> > Rows Removed by Filter: 1000
> > Planning Time: 0.136 ms
> > Execution Time: 7.522 ms
> > (5 rows)
> >
> > I think so pattern coalesce(var, X) = X is very common so can be very
> > interesting to support it better.
>
> Better support sounds great!
>
> How specifically might this be better supported? On this relatively
> short table, I see planning times considerably longer, I assume
> because they need to take a function call into account, and execution
> times longer but not all that much longer. I tried with 3 million
> rows, and got the representative samples below:
>
> shackle(at)[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT *
> FROM xxx WHERE COALESCE(a, 0)=0;
> QUERY PLAN
>
>
> ══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
> Gather (cost=1000.00..30391.00 rows=15000 width=4) (actual
> time=1.315..346.406 rows=999772 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Parallel Seq Scan on xxx (cost=0.00..27891.00 rows=6250 width=4)
> (actual time=0.029..216.419 rows=333257 loops=3)
> Filter: (COALESCE(a, 0) = 0)
> Rows Removed by Filter: 666743
> Planning Time: 0.204 ms
> Execution Time: 389.307 ms
> (8 rows)
>
> Time: 391.394 ms
>
> shackle(at)[local]:5413/ctest(13devel)(149711) # EXPLAIN ANALYZE SELECT *
> FROM xxx WHERE a IS NULL OR a = 0;
> QUERY PLAN
>
>
> ═════════════════════════════════════════════════════════════════════════════════════════════════════════════
> Seq Scan on xxx (cost=0.00..49766.00 rows=995700 width=4) (actual
> time=0.043..524.401 rows=999772 loops=1)
> Filter: ((a IS NULL) OR (a = 0))
> Rows Removed by Filter: 2000228
> Planning Time: 0.106 ms
> Execution Time: 560.593 ms
> (5 rows)
>
> Time: 561.186 ms
>

I didn't thing about rewriting. The correct solution should be via own
selectivity function. Now for coalesce is used 5% estimation (like for
other functions). Probably it should not be hard code because coalesce is a
node already. But it is part of code that I never modified.

Pavel

>
> Best,
> David.
> --
> David Fetter <david(at)fetter(dot)org> http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-11-28 03:53:34 Re: Add a GUC variable that control logical replication
Previous Message Andrew Dunstan 2019-11-28 03:45:54 Re: jsonb_set() strictness considered harmful to data