Re: Early WIP/PoC for inlining CTEs

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: andres(at)anarazel(dot)de
Cc: thomas(dot)munro(at)enterprisedb(dot)com, david(at)fetter(dot)org, andrew(at)tao11(dot)riddles(dot)org(dot)uk, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Early WIP/PoC for inlining CTEs
Date: 2019-03-13 00:45:27
Message-ID: 20190313.094527.1905766529049763845.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 2018-08-08 16:55:22 +1200, Thomas Munro wrote:
>> On Fri, Jul 27, 2018 at 8:10 PM, David Fetter <david(at)fetter(dot)org> wrote:
>> > On Fri, Jul 27, 2018 at 02:55:26PM +1200, Thomas Munro wrote:
>> >> On Thu, Jul 26, 2018 at 7:14 AM, David Fetter <david(at)fetter(dot)org> wrote:
>> >> > Please find attached the next version, which passes 'make check'.
>> >>
>> >> ... but not 'make check-world' (contrib/postgres_fdw's EXPLAIN is different).
>> >
>> > Please find attached a patch that does.
>> >
>> > It doesn't always pass make installcheck-world, but I need to sleep
>> > rather than investigate that at the moment.
>>
>> One observation I wanted to share: CTE scans inhibit parallelism today
>> (something we might eventually want to fix with shared tuplestores).
>> This patch therefore allows parallelism in some WITH queries, which
>> seems like a very valuable thing.
>
> Might be interesting to see how big a difference it makes for
> TPC-DS. Currently the results are bad (as in many queries don't finish
> in a relevant time) because it uses CTEs so widely, and there's often
> predicates outside the CTE that could be pushed down.

Now that the patch was committed, I played with TPCS-DS and found at
least one of their queries gets speedup. Query 2 runs 2 times faster
than 11. In 12, it seems that CTE wscs is pushed down.

with wscs as
(select sold_date_sk
,sales_price
from (select ws_sold_date_sk sold_date_sk
,ws_ext_sales_price sales_price
from web_sales
union all
select cs_sold_date_sk sold_date_sk
,cs_ext_sales_price sales_price
from catalog_sales) as s1),
wswscs as
(select d_week_seq,
sum(case when (d_day_name='Sunday') then sales_price else null end) sun_sales,
sum(case when (d_day_name='Monday') then sales_price else null end) mon_sales,
sum(case when (d_day_name='Tuesday') then sales_price else null end) tue_sales,
sum(case when (d_day_name='Wednesday') then sales_price else null end) wed_sales,
sum(case when (d_day_name='Thursday') then sales_price else null end) thu_sales,
sum(case when (d_day_name='Friday') then sales_price else null end) fri_sales,
sum(case when (d_day_name='Saturday') then sales_price else null end) sat_sales
from wscs
,date_dim
where d_date_sk = sold_date_sk
group by d_week_seq)
select d_week_seq1
,round(sun_sales1/sun_sales2,2)
,round(mon_sales1/mon_sales2,2)
,round(tue_sales1/tue_sales2,2)
,round(wed_sales1/wed_sales2,2)
,round(thu_sales1/thu_sales2,2)
,round(fri_sales1/fri_sales2,2)
,round(sat_sales1/sat_sales2,2)
from
(select wswscs.d_week_seq d_week_seq1
,sun_sales sun_sales1
,mon_sales mon_sales1
,tue_sales tue_sales1
,wed_sales wed_sales1
,thu_sales thu_sales1
,fri_sales fri_sales1
,sat_sales sat_sales1
from wswscs,date_dim
where date_dim.d_week_seq = wswscs.d_week_seq and
d_year = 1998) y,
(select wswscs.d_week_seq d_week_seq2
,sun_sales sun_sales2
,mon_sales mon_sales2
,tue_sales tue_sales2
,wed_sales wed_sales2
,thu_sales thu_sales2
,fri_sales fri_sales2
,sat_sales sat_sales2
from wswscs
,date_dim
where date_dim.d_week_seq = wswscs.d_week_seq and
d_year = 1998+1) z
where d_week_seq1=d_week_seq2-53
order by d_week_seq1;

Here's the 12's plan:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=118929.39..118929.43 rows=13 width=228) (actual time=792.588..792.710 rows=2513 loops=1)
Sort Key: wswscs.d_week_seq
Sort Method: quicksort Memory: 323kB
CTE wswscs
-> Finalize GroupAggregate (cost=110164.09..113672.71 rows=10447 width=228) (actual time=766.232..768.415 rows=263 loops=1)
Group Key: date_dim_2.d_week_seq
-> Gather Merge (cost=110164.09..112601.89 rows=20894 width=228) (actual time=766.209..767.158 rows=789 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=109164.06..109190.18 rows=10447 width=228) (actual time=763.059..763.078 rows=263 loops=3)
Sort Key: date_dim_2.d_week_seq
Sort Method: quicksort Memory: 160kB
Worker 0: Sort Method: quicksort Memory: 160kB
Worker 1: Sort Method: quicksort Memory: 160kB
-> Partial HashAggregate (cost=108179.39..108466.69 rows=10447 width=228) (actual time=762.202..762.889 rows=263 loops=3)
Group Key: date_dim_2.d_week_seq
-> Parallel Hash Join (cost=2371.82..74413.79 rows=900416 width=20) (actual time=17.166..424.834 rows=717854 loops=3)
Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_2.d_date_sk)
-> Parallel Append (cost=0.00..69678.24 rows=900416 width=10) (actual time=0.029..248.992 rows=720311 loops=3)
-> Parallel Seq Scan on catalog_sales (cost=0.00..43411.73 rows=600673 width=10) (actual time=0.018..130.163 rows=480516 loops=3)
-> Parallel Seq Scan on web_sales (cost=0.00..21764.43 rows=299743 width=10) (actual time=0.026..95.629 rows=359692 loops=2)
-> Parallel Hash (cost=1834.70..1834.70 rows=42970 width=18) (actual time=16.610..16.610 rows=24350 loops=3)
Buckets: 131072 Batches: 1 Memory Usage: 5056kB
-> Parallel Seq Scan on date_dim date_dim_2 (cost=0.00..1834.70 rows=42970 width=18) (actual time=0.020..7.617 rows=24350 loops=3)
-> Hash Join (cost=5007.74..5256.44 rows=13 width=228) (actual time=785.300..792.123 rows=2513 loops=1)
Hash Cond: (wswscs.d_week_seq = date_dim.d_week_seq)
-> CTE Scan on wswscs (cost=0.00..208.94 rows=10447 width=228) (actual time=766.236..766.263 rows=263 loops=1)
-> Hash (cost=5007.58..5007.58 rows=13 width=232) (actual time=19.033..19.033 rows=2513 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 288kB
-> Merge Join (cost=5001.97..5007.58 rows=13 width=232) (actual time=17.739..18.210 rows=2513 loops=1)
Merge Cond: (((wswscs_1.d_week_seq - 53)) = date_dim.d_week_seq)
-> Sort (cost=2668.33..2669.24 rows=365 width=228) (actual time=9.906..9.924 rows=365 loops=1)
Sort Key: ((wswscs_1.d_week_seq - 53))
Sort Method: quicksort Memory: 76kB
-> Hash Join (cost=2322.68..2652.79 rows=365 width=228) (actual time=7.864..9.764 rows=365 loops=1)
Hash Cond: (wswscs_1.d_week_seq = date_dim_1.d_week_seq)
-> CTE Scan on wswscs wswscs_1 (cost=0.00..208.94 rows=10447 width=228) (actual time=0.001..2.287 rows=263 loops=1)
-> Hash (cost=2318.11..2318.11 rows=365 width=4) (actual time=7.389..7.389 rows=365 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 21kB
-> Seq Scan on date_dim date_dim_1 (cost=0.00..2318.11 rows=365 width=4) (actual time=3.876..7.348 rows=365 loops=1)
Filter: (d_year = 1999)
Rows Removed by Filter: 72684
-> Sort (cost=2333.65..2334.56 rows=365 width=4) (actual time=7.824..7.930 rows=2514 loops=1)
Sort Key: date_dim.d_week_seq
Sort Method: quicksort Memory: 42kB
-> Seq Scan on date_dim (cost=0.00..2318.11 rows=365 width=4) (actual time=3.950..7.765 rows=365 loops=1)
Filter: (d_year = 1998)
Rows Removed by Filter: 72684
Planning Time: 0.956 ms
Execution Time: 794.072 ms
(50 rows)

Here's 11's plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=227105.32..227105.35 rows=13 width=228) (actual time=2019.071..2019.163 rows=2513 loops=1)
Sort Key: wswscs.d_week_seq
Sort Method: quicksort Memory: 323kB
CTE wscs
-> Append (cost=0.00..88586.64 rows=2160976 width=10) (actual time=0.015..521.937 rows=2160932 loops=1)
-> Seq Scan on web_sales (cost=0.00..25960.84 rows=719384 width=10) (actual time=0.014..137.557 rows=719384 loops=1)
-> Seq Scan on catalog_sales (cost=0.00..51820.92 rows=1441592 width=10) (actual time=0.011..269.559 rows=1441548 loops=1)
CTE wswscs
-> HashAggregate (cost=132977.62..133264.03 rows=10415 width=228) (actual time=1996.856..1997.387 rows=263 loops=1)
Group Key: date_dim_2.d_week_seq
-> Hash Join (cost=3048.60..51941.02 rows=2160976 width=28) (actual time=36.414..1323.387 rows=2153563 loops=1)
Hash Cond: (wscs.sold_date_sk = date_dim_2.d_date_sk)
-> CTE Scan on wscs (cost=0.00..43219.52 rows=2160976 width=18) (actual time=0.017..942.090 rows=2160932 loops=1)
-> Hash (cost=2135.49..2135.49 rows=73049 width=18) (actual time=35.870..35.871 rows=73049 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 4734kB
-> Seq Scan on date_dim date_dim_2 (cost=0.00..2135.49 rows=73049 width=18) (actual time=0.011..17.749 rows=73049 loops=1)
-> Hash Join (cost=5006.47..5254.41 rows=13 width=228) (actual time=2012.260..2018.602 rows=2513 loops=1)
Hash Cond: (wswscs.d_week_seq = date_dim.d_week_seq)
-> CTE Scan on wswscs (cost=0.00..208.30 rows=10415 width=228) (actual time=1996.858..1996.876 rows=263 loops=1)
-> Hash (cost=5006.31..5006.31 rows=13 width=232) (actual time=15.380..15.380 rows=2513 loops=1)
Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 288kB
-> Merge Join (cost=5000.73..5006.31 rows=13 width=232) (actual time=14.272..14.683 rows=2513 loops=1)
Merge Cond: (((wswscs_1.d_week_seq - 53)) = date_dim.d_week_seq)
-> Sort (cost=2667.18..2668.09 rows=363 width=228) (actual time=7.374..7.393 rows=365 loops=1)
Sort Key: ((wswscs_1.d_week_seq - 53))
Sort Method: quicksort Memory: 76kB
-> Hash Join (cost=2322.65..2651.75 rows=363 width=228) (actual time=6.386..7.154 rows=365 loops=1)
Hash Cond: (wswscs_1.d_week_seq = date_dim_1.d_week_seq)
-> CTE Scan on wswscs wswscs_1 (cost=0.00..208.30 rows=10415 width=228) (actual time=0.000..0.707 rows=263 loops=1)
-> Hash (cost=2318.11..2318.11 rows=363 width=4) (actual time=6.367..6.367 rows=365 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 21kB
-> Seq Scan on date_dim date_dim_1 (cost=0.00..2318.11 rows=363 width=4) (actual time=3.000..6.330 rows=365 loops=1)
Filter: (d_year = 1999)
Rows Removed by Filter: 72684
-> Sort (cost=2333.55..2334.45 rows=363 width=4) (actual time=6.890..6.975 rows=2514 loops=1)
Sort Key: date_dim.d_week_seq
Sort Method: quicksort Memory: 42kB
-> Seq Scan on date_dim (cost=0.00..2318.11 rows=363 width=4) (actual time=3.832..6.841 rows=365 loops=1)
Filter: (d_year = 1998)
Rows Removed by Filter: 72684
Planning Time: 0.962 ms
Execution Time: 2027.758 ms
(42 rows)

BTW, in my small TPC-DS environment (2GB), only two queries were not
finished within 30 minutes (query 4 and 11). My guess is these seem to
be suffered from statistic errors (I got this hint from Kaigai-san's
slide: https://www.slideshare.net/kaigai/tpcdspostgresql, especially
page 20. Unfortunately the slide is in Japanese). And it leads to
choosing a bad plan: nested loop join. If I disable nested loop join,
these two queries finished quickly.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shawn Debnath 2019-03-13 00:53:43 Re: Introduce timeout capability for ConditionVariableSleep
Previous Message Amit Langote 2019-03-13 00:39:54 Re: Should we add GUCs to allow partition pruning to be disabled?