Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Jian Guo <gjian(at)vmware(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Hans Buschmann <buschmann(at)nidsa(dot)net>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zhenghua Lyu <zlyu(at)vmware(dot)com>
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Date: 2023-11-16 18:16:08
Message-ID: 414317.1700158568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Richard Guo <guofenglinux(at)gmail(dot)com> writes:
> I think the second plan (patched) makes more sense. In the first plan
> (unpatched), the HashAggregate node actually does not reduce the the
> number of rows because it groups by 'unique1', but planner does not know
> that because it lacks statistics for Vars referencing the CTE.

Yeah. It's faster in reality too:

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=692.29..692.30 rows=1 width=8) (actual time=15.186..15.188 rows=1 loops=1)
CTE x
-> Index Only Scan using tenk1_unique1 on tenk1 b (cost=0.29..270.29 rows=10000 width=4) (actual time=0.028..0.754 rows=10000 loops=1)
Heap Fetches: 0
-> Nested Loop (cost=225.28..409.50 rows=5000 width=0) (actual time=3.652..14.733 rows=10000 loops=1)
-> HashAggregate (cost=225.00..227.00 rows=200 width=4) (actual time=3.644..4.510 rows=10000 loops=1)
Group Key: x.unique1
Batches: 1 Memory Usage: 929kB
-> CTE Scan on x (cost=0.00..200.00 rows=10000 width=4) (actual time=0.030..1.932 rows=10000 loops=1)
-> Index Only Scan using tenk1_unique1 on tenk1 a (cost=0.29..0.90 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=10000)
Index Cond: (unique1 = x.unique1)
Heap Fetches: 0
Planning Time: 0.519 ms
Execution Time: 15.479 ms
(14 rows)

vs

regression=# explain analyze with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1028.07..1028.08 rows=1 width=8) (actual time=4.578..4.579 rows=1 loops=1)
CTE x
-> Index Only Scan using tenk1_unique1 on tenk1 b (cost=0.29..270.29 rows=10000 width=4) (actual time=0.011..0.751 rows=10000 loops=1)
Heap Fetches: 0
-> Hash Semi Join (cost=325.28..732.78 rows=10000 width=0) (actual time=2.706..4.305 rows=10000 loops=1)
Hash Cond: (a.unique1 = x.unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 a (cost=0.29..270.29 rows=10000 width=4) (actual time=0.011..0.676 rows=10000 loops=1)
Heap Fetches: 0
-> Hash (cost=200.00..200.00 rows=10000 width=4) (actual time=2.655..2.655 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 480kB
-> CTE Scan on x (cost=0.00..200.00 rows=10000 width=4) (actual time=0.012..1.963 rows=10000 loops=1)
Planning Time: 0.504 ms
Execution Time: 4.821 ms
(13 rows)

Now, what you get if you remove MATERIALIZED is faster yet:

regression=# explain analyze with x as (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=715.57..715.58 rows=1 width=8) (actual time=2.681..2.682 rows=1 loops=1)
-> Merge Semi Join (cost=0.57..690.57 rows=10000 width=0) (actual time=0.016..2.408 rows=10000 loops=1)
Merge Cond: (a.unique1 = b.unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 a (cost=0.29..270.29 rows=10000 width=4) (actual time=0.007..0.696 rows=10000 loops=1)
Heap Fetches: 0
-> Index Only Scan using tenk1_unique1 on tenk1 b (cost=0.29..270.29 rows=10000 width=4) (actual time=0.007..0.655 rows=10000 loops=1)
Heap Fetches: 0
Planning Time: 0.160 ms
Execution Time: 2.718 ms
(9 rows)

I poked into that and found that the reason we don't get a mergejoin
with the materialized CTE is that the upper planner invocation doesn't
know that the CTE's output is sorted, so it thinks a separate sort
step would be needed.

So you could argue that there's more to do here, but I'm hesitant
to go further. Part of the point of MATERIALIZED is to be an
optimization fence, so breaking down that fence is something to be
wary of. Maybe we shouldn't even take this patch --- but on
balance I think it's an OK compromise.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-11-16 19:36:11 Re: trying again to get incremental backup
Previous Message Robert Haas 2023-11-16 17:50:43 Re: trying again to get incremental backup