Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From: tender wang <tndrwang(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
Date: 2023-09-05 10:10:32
Message-ID: CAHewXNnVo-7D8E1=9PPR_yirhT1hPnvP=yC9bHEiQTUC0YSZWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After using patch, the result as below :
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..26630101.20 rows=1 width=27) (actual
time=160571.005..160571.105 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual
time=1.065..1.066 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual
time=1.064..1.065 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8)
(actual time=0.046..0.830 rows=2000 loops=1)
-> Gather (cost=1000.00..26630023.20 rows=1 width=27) (actual
time=160571.003..160571.102 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..26629023.10 rows=1
width=27) (actual time=160549.257..160549.258 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND
(sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97
rows=35397 width=11) (actual time=0.010..3.393 rows=30088 loops=2)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27)
(actual time=0.000..2.839 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75
rows=60175 width=27) (actual time=0.008..11.381 rows=60175 loops=2)
Planning Time: 0.174 ms
Execution Time: 160571.476 ms
(20 rows)

tender wang <tndrwang(at)gmail(dot)com> 于2023年9月5日周二 16:52写道:

> Hi all,
>
> I recently run benchmark[1] on master, but I found performance problem
> as below:
>
> explain analyze select
> subq_0.c0 as c0,
> subq_0.c1 as c1,
> subq_0.c2 as c2
> from
> (select
> ref_0.l_shipmode as c0,
> sample_0.l_orderkey as c1,
> sample_0.l_quantity as c2,
> ref_0.l_orderkey as c3,
> sample_0.l_shipmode as c5,
> ref_0.l_shipinstruct as c6
> from
> public.lineitem as ref_0
> left join public.lineitem as sample_0
> on ((select p_partkey from public.part order by p_partkey limit
> 1)
> is not NULL)
> where sample_0.l_orderkey is NULL) as subq_0
> where subq_0.c5 is NULL
> limit 1;
> QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=78.00..45267050.75 rows=1 width=27) (actual
> time=299695.097..299695.099 rows=0 loops=1)
> InitPlan 1 (returns $0)
> -> Limit (cost=78.00..78.00 rows=1 width=8) (actual
> time=0.651..0.652 rows=1 loops=1)
> -> Sort (cost=78.00..83.00 rows=2000 width=8) (actual
> time=0.650..0.651 rows=1 loops=1)
> Sort Key: part.p_partkey
> Sort Method: top-N heapsort Memory: 25kB
> -> Seq Scan on part (cost=0.00..68.00 rows=2000
> width=8) (actual time=0.013..0.428 rows=2000 loops=1)
> -> Nested Loop Left Join (cost=0.00..45266972.75 rows=1 width=27)
> (actual time=299695.096..299695.096 rows=0 loops=1)
> Join Filter: ($0 IS NOT NULL)
> Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode
> IS NULL))
> Rows Removed by Filter: 3621030625
> -> Seq Scan on lineitem ref_0 (cost=0.00..1969.75 rows=60175
> width=11) (actual time=0.026..6.225 rows=60175 loops=1)
> -> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual
> time=0.000..2.554 rows=60175 loops=60175)
> -> Seq Scan on lineitem sample_0 (cost=0.00..1969.75
> rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
> Planning Time: 0.172 ms
> Execution Time: 299695.501 ms
> (16 rows)
>
> After I set enable_material to off, the same query run faster, as below:
> set enable_material = off;
> explain analyze select
> subq_0.c0 as c0,
> subq_0.c1 as c1,
> subq_0.c2 as c2
> from
> (select
> ref_0.l_shipmode as c0,
> sample_0.l_orderkey as c1,
> sample_0.l_quantity as c2,
> ref_0.l_orderkey as c3,
> sample_0.l_shipmode as c5,
> ref_0.l_shipinstruct as c6
> from
> public.lineitem as ref_0
> left join public.lineitem as sample_0
> on ((select p_partkey from public.part order by p_partkey limit
> 1)
> is not NULL)
> where sample_0.l_orderkey is NULL) as subq_0
> where subq_0.c5 is NULL
> limit 1;
> QUERY
> PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1078.00..91026185.57 rows=1 width=27) (actual
> time=192669.605..192670.425 rows=0 loops=1)
> InitPlan 1 (returns $0)
> -> Limit (cost=78.00..78.00 rows=1 width=8) (actual
> time=0.662..0.663 rows=1 loops=1)
> -> Sort (cost=78.00..83.00 rows=2000 width=8) (actual
> time=0.661..0.662 rows=1 loops=1)
> Sort Key: part.p_partkey
> Sort Method: top-N heapsort Memory: 25kB
> -> Seq Scan on part (cost=0.00..68.00 rows=2000
> width=8) (actual time=0.017..0.430 rows=2000 loops=1)
> -> Gather (cost=1000.00..91026107.57 rows=1 width=27) (actual
> time=192669.604..192670.422 rows=0 loops=1)
> Workers Planned: 1
> Params Evaluated: $0
> Workers Launched: 1
> -> Nested Loop Left Join (cost=0.00..91025107.47 rows=1
> width=27) (actual time=192588.143..192588.144 rows=0 loops=2)
> Join Filter: ($0 IS NOT NULL)
> Filter: ((sample_0.l_orderkey IS NULL) AND
> (sample_0.l_shipmode IS NULL))
> Rows Removed by Filter: 1810515312
> -> Parallel Seq Scan on lineitem ref_0
> (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.007..3.797
> rows=30088 loops=2)
> -> Seq Scan on lineitem sample_0 (cost=0.00..1969.75
> rows=60175 width=27) (actual time=0.000..2.637 rows=60175 loops=60175)
> Planning Time: 0.174 ms
> Execution Time: 192670.458 ms
> (19 rows)
>
> I debug the code and find consider_parallel_nestloop() doesn't consider
> materialized form of the cheapest inner path.
> When enable_material = true, we can see Material path won in first plan,
> but Parallel Seq Scan node doesn't add as outer path, which because
> in try_partial_nestloop_path() , the cost of nestloop wat computed using
> seq scan path not material path.
>
> [1] include test table schema and data, you can repeat above problem.
>
> I try fix this problem in attached patch, and I found pg12.12 also had
> this issue. Please review my patch, thanks!
>
> [1] https://github.com/tenderwg/tpch_test
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2023-09-05 10:21:46 Re: pg_basebackup: Always return valid temporary slot names
Previous Message Alvaro Herrera 2023-09-05 09:59:40 Re: Optionally using a better backtrace library?