Re: Our trial to TPC-DS but optimizer made unreasonable plan

From: Qingqing Zhou <zhouqq(dot)postgres(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Our trial to TPC-DS but optimizer made unreasonable plan
Date: 2015-08-26 21:38:34
Message-ID: CAJjS0u0XWkBE5MfLqwWkKZZ20AZP+5jjsi+EuUOCmJGLDqX0oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 19, 2015 at 10:32 AM, Qingqing Zhou
<zhouqq(dot)postgres(at)gmail(dot)com> wrote:
> On Tue, Aug 18, 2015 at 5:59 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>> BTW, did you register the patch on the upcoming commit-fest?
>>
> Not yet, it is in WIP status.
>

While I am working on the patch, I found some issues and resort help
here. Patch attached.

Here is an example:

postgres=# explain WITH q AS (
WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1 on p.i>=p1.i)
SELECT * FROM q WHERE i <= 5;
QUERY PLAN
----------------------------------------------------------------------------------
Nested Loop (cost=0.58..5980.16 rows=133333 width=8)
-> Index Scan using ai on a (cost=0.29..8.36 rows=4 width=8)
Index Cond: (i <= 5)
-> Index Only Scan using ai on a a_1 (cost=0.29..1159.62
rows=33333 width=4)
Index Cond: (i <= a.i)
(5 rows)

So far so good. But if we add other references of the CTE q (m1->m,
m->q), we still have some extra CTE scans:

postgres=# explain WITH q AS (
WITH p AS (SELECT * from a) SELECT p.* FROM p JOIN p p1 on
p.i>=p1.i), m as (select * from q), m1 as (select * from m)
SELECT * FROM m1 WHERE i <= 5;
QUERY PLAN
-----------------------------------------------------------------------------------------
CTE Scan on m (cost=158365985.66..233365985.65 rows=1111111111 width=8)
Filter: (i <= 5)
CTE q
-> Nested Loop (cost=0.29..91699319.00 rows=3333333333 width=8)
-> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=8)
-> Index Only Scan using ai on a a_1 (cost=0.29..583.65
rows=33333 width=4)
Index Cond: (i <= a.i)
CTE m
-> CTE Scan on q (cost=0.00..66666666.66 rows=3333333333 width=8)
(9 rows)

Above two queries essentially the same, but the second one is a
non-optimal plan. The reason is that how my patch works: it put a
substitution in front of SS_process_ctes():

/*
* If there is a WITH list, process each WITH query and build an initplan
! * SubPlan structure for it. Before we process ctes, try to subsitute with
! * subqueries to benefits from global optimization.
*/
if (parse->cteList)
+ {
+ substitute_ctes_with_subqueries(root);
SS_process_ctes(root);
+ }

AFAICS, the substitution only handles cteList within a query block, so
it does not go across the subquery boundary. I can see this is an
issue but can't see a nice way to fix it. Anybody has some recipe?

Regards,
Qingqing

Attachment Content-Type Size
ctes.patch application/octet-stream 8.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-08-26 22:23:42 Re: Custom Scans and private data
Previous Message Fabrízio de Royes Mello 2015-08-26 19:44:57 Re: Is this a bug?