Re: BUG #16053: Query planner performance regression in sql query with multiple CTE in v12

From: Federico <cfederico87(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16053: Query planner performance regression in sql query with multiple CTE in v12
Date: 2019-10-30 09:59:59
Message-ID: CAN19dyfAuC+86gKFemQmBrAP0GO9g8VSkk66VrBVnVZnuWGv+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Sorry for the late reply and for not including the query in the first
message.

Is there a way to avoid the mis-estimate other than changing the query?

Following is the query

--explain analyse
WITH cte_A AS
(SELECT inv_y.cid AS cid,
inv_y.pid AS pid,
sum(inv_y.value) AS value
FROM inv_y
WHERE inv_y.y = 1
GROUP BY inv_y.cid,
inv_y.pid),
cte_B AS
(SELECT inv_y.cid AS cid,
inv_y.pid AS pid
FROM inv_y
WHERE inv_y.y = 1),
cte_C AS
(SELECT sum(inv.value) AS value,
inv.cid AS cid,
inv.pid AS pid
FROM inv
JOIN cte_B ON cte_B.cid = inv.cid
AND cte_B.pid = inv.pid
WHERE inv.invd >= '2017-01-01'
AND inv.invd <= '2018-12-31'
GROUP BY inv.cid,
inv.pid),
cte_D AS
(SELECT cte_C.cid AS cid,
cte_C.pid AS pid,
sum(cte_C.value) AS value
FROM cte_C
GROUP BY cte_C.cid,
cte_C.pid),
cte_E AS
(SELECT sum(inv.value) AS value,
inv.cid AS cid,
inv.pid AS pid
FROM inv
JOIN cte_B ON cte_B.cid = inv.cid
AND cte_B.pid = inv.pid
WHERE inv.invd >= '2018-01-01'
AND inv.invd <= '2018-06-30'
GROUP BY inv.cid,
inv.pid),
cte_F AS
(SELECT cte_B.cid AS cid,
cte_B.pid AS pid,
coalesce(sum(cte_E.value), 0) AS value
FROM cte_B
LEFT OUTER JOIN cte_E ON cte_E.cid = cte_B.cid
AND cte_E.pid = cte_B.pid
GROUP BY cte_B.cid,
cte_B.pid),
cte_G AS
(SELECT sum(expe.value) AS value,
expe.cid AS cid,
expe.pid AS pid
FROM expe
JOIN cte_B ON cte_B.cid = expe.cid
AND cte_B.pid = expe.pid
WHERE expe.invd >= '2018-07-01'
AND expe.invd <= '2018-12-31'
GROUP BY expe.cid,
expe.pid),
cte_H AS
(SELECT cte_B.cid AS cid,
cte_B.pid AS pid,
coalesce(sum(cte_G.value), 0) AS value
FROM cte_B
LEFT OUTER JOIN cte_G ON cte_G.cid = cte_B.cid
AND cte_G.pid = cte_B.pid
GROUP BY cte_B.cid,
cte_B.pid)
SELECT cte_D.value AS ref_value,
cte_D.value - (cte_F.value + cte_H.value) AS rem_value
FROM cte_A
JOIN cte_D ON cte_A.cid = cte_D.cid
AND cte_A.pid = cte_D.pid
JOIN cte_F ON cte_A.cid = cte_F.cid
AND cte_A.pid = cte_F.pid
JOIN cte_H ON cte_A.cid = cte_H.cid
AND cte_A.pid = cte_H.pid
--join cte_B ON cte_A.cid = cte_B.cid
--AND cte_A.pid = cte_B.pid

On Sat, 12 Oct 2019 at 19:17, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
wrote:

> >>>>> "PG" == PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
>
> PG> After upgrading to v12 from v11.4 I've noticed one of the queries
> PG> run by an application has a significant performance degradation.
>
> PG> I've managed to create a toy db that illustrates the issue.
> PG> Following is a link of the db and the query in question (there
> PG> seems to be no way to attach a file from the web form. If I should
> PG> upload them somewhere else let me know)
> PG> https://www.dropbox.com/s/6n3ctdn2zohoann/toy-db_and_query.zip?dl=0
>
> You could at least have included the query in the message.
>
> What seems to be happening here looks to me more down to chance than to
> any bug. In v11, the join order is constrained by the CTEs, whereas in
> v12 it is not; the misplan is the result of a mis-estimated 1-row result
> from a subpath that may not have been considered under v11.
>
> --
> Andrew (irc:RhodiumToad)
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeremy Finzel 2019-10-30 11:36:00 insert into inet from text automatically adding subnet
Previous Message Pavel Popov 2019-10-30 08:15:18 Re: Backup/restore problem