Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Hans Buschmann <buschmann(at)nidsa(dot)net>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Date: 2023-02-08 21:27:46
Message-ID: e44da652-2ecd-af34-a087-54fe327d5308@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/8/23 14:55, Hans Buschmann wrote:
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
>
>
> 1. Background
>
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
>
>
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
>
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
>
>
> These 4 CTEs are then used to combine the data by joining them.
>
>
> 2. Problem
>
> The 4 CTEs are joined by left joins as shown below:
>
...
>
> This case really brought me to detect the problem!
>
> The original query and data are not shown here, but the principle should
> be clear from the execution plans.
>
> I think the planner shouldn't change the row estimations on further
> steps after left joins at all, and be a bit more conservative on inner
> joins.

But the code should alredy do exactly that, see:

https://github.com/postgres/postgres/blob/dbe8a1726cfd5a09cf1ef99e76f5f89e2efada71/src/backend/optimizer/path/costsize.c#L5212

And in fact, the second part of the plains shows it's doing the trick:

-> Merge Left Join (cost=1293.25..1388.21 rows=5733 width=104)
(actual time=2.321..2.556 rows=1415 loops=1)
Merge Cond: ((qup_1.curr_season = qli_1.curr_season) AND
((qup_1.curr_code)::text = (qli_1.curr_code)::text))
-> Sort (cost=641.68..656.02 rows=5733 width=72)
-> Sort (cost=651.57..666.11 rows=5816 width=72)

But notice the first join (with rows=33) doesn't say "Left". And I see
there's Append on top, so presumably the query is much more complex, and
there's a regular join of these CTEs in some other part.

We'll need to se the whole query, not just one chunk of it.

FWIW it seems you're using materialized CTEs - that's likely pretty bad
for the estimates, because we don't propagate statistics from the CTE.
So a join on CTEs can't see statistics from the underlying tables, and
that can easily produce really bad estimates.

I'm assuming you're not using AS MATERIALIZED explicitly, so I'd bet
this happens because the "cardinality" function is marked as volatile.
Perhaps it can be redefined as stable/immutable.

> This may be related to the fact that this case has 2 join-conditions
> (xx_season an xx_code).

That shouldn't affect outer join estimates this way (but as I explained
above, the join does not seem to be "left" per the explain).
Multi-column joins can cause issues, no doubt about it - but CTEs make
it worse because we can't e.g. see foreign keys.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-02-08 21:58:48 Re: OpenSSL 3.0.0 vs old branches
Previous Message Robert Haas 2023-02-08 21:24:15 Re: Weird failure with latches in curculio on v15