Re: apply_scanjoin_target_to_paths and partitionwise join

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: apply_scanjoin_target_to_paths and partitionwise join
Date: 2024-05-06 12:58:14
Message-ID: CAExHW5utxnWq3xuVjEeejkP_=TqMEoXnVktDxu6myJ7Tb+kK-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 6, 2024 at 4:26 PM Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
wrote:

> Hi Ashutosh & hackers,
>
> On Mon, Apr 15, 2024 at 9:00 AM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> > Here's patch with
> >
> [..]
> > Adding to the next commitfest but better to consider this for the next
> set of minor releases.
>
> 1. The patch does not pass cfbot -
> https://cirrus-ci.com/task/5486258451906560 on master due to test
> failure "not ok 206 + partition_join"
>

So I need to create a patch for master first. I thought CFBot somehow knew
that the patch was created for PG 15. :)

>
> 2. Without the patch applied, the result of the meson test on master
> was clean (no failures , so master is fine). After applying patch
> there were expected some hunk failures (as the patch was created for
> 15_STABLE):
>
> patching file src/backend/optimizer/plan/planner.c
> Hunk #1 succeeded at 7567 (offset 468 lines).
> Hunk #2 succeeded at 7593 (offset 468 lines).
> patching file src/test/regress/expected/partition_join.out
> Hunk #1 succeeded at 4777 (offset 56 lines).
> Hunk #2 succeeded at 4867 (offset 56 lines).
> patching file src/test/regress/sql/partition_join.sql
> Hunk #1 succeeded at 1136 (offset 1 line).
>
> 3. Without patch there is performance regression/bug on master (cost
> is higher with enable_partitionwise_join=on that without it):
>
> data preparation:
> -- Test the process_outer_partition() code path
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0000',
> '0001', '0002');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0003',
> '0004');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
> generate_series(0, 24) i;
> ANALYZE plt1_adv;
>
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0003',
> '0004');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
> generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4);
> ANALYZE plt2_adv;
>
> CREATE TABLE plt3_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt3_adv_p1 PARTITION OF plt3_adv FOR VALUES IN ('0001');
> CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003',
> '0004');
> INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
> generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
> ANALYZE plt3_adv;
>
> off:
> EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
> LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
> = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
> 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------
> Sort (cost=22.02..22.58 rows=223 width=27)
> Sort Key: t1.c, t1.a, t2.a, t3.a
> -> Hash Full Join (cost=4.83..13.33 rows=223 width=27)
> [..]
>
>
> with enable_partitionwise_join=ON (see the jump from cost 22.02 -> 27.65):
> EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
> LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
> = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
> 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------
> Sort (cost=27.65..28.37 rows=289 width=27)
> Sort Key: t1.c, t1.a, t2.a, t3.a
> -> Append (cost=2.23..15.83 rows=289 width=27)
> -> Hash Full Join (cost=2.23..4.81 rows=41 width=27)
> [..]
> -> Hash Full Join (cost=2.45..9.57 rows=248 width=27)
> [..]
>
>
> However with the patch applied the plan with minimal cost is always
> chosen ("22"):
>
> explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT
> JOIN
> plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
> coalesce(t1.a, 0 ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
> t1.c, t1.a, t2.a, t3.a;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------
> Sort (cost=22.02..22.58 rows=223 width=27)
> Sort Key: t1.c, t1.a, t2.a, t3.a
> -> Hash Full Join (cost=4.83..13.33 rows=223 width=27)
> [..]
>
>
> set enable_partitionwise_join to on;
> explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT
> JOIN
> plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
> coalesce(t1.a, 0 ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
> t1.c, t1.a, t2.a, t3.a;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------
> Sort (cost=22.02..22.58 rows=223 width=27)
> Sort Key: t1.c, t1.a, t2.a, t3.a
> -> Hash Full Join (cost=4.83..13.33 rows=223 width=27)
> [..]
>
> with the patch applied, the minimal cost (with toggle on or off) the
> cost always stays the minimal from the available ones. We cannot
> provide a reproducer for real performance regression, but for the
> affected customer it took 530+s (with enable_partitionwise_join=on)

and without that GUC it it was ~23s.
>

Thanks for providing actual timing. That's a huge difference.

> 4. meson test ends up with failures like below:
>
> 4/290 postgresql:regress / regress/regress
> ERROR 32.67s
> 6/290 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
> ERROR 56.96s
> 35/290 postgresql:recovery / recovery/027_stream_regress
> ERROR 40.20s
>
> (all due to "regression tests pass" failures)
>
> the partition_join.sql is failing for test 206, so for this:
>
> -- partitionwise join with fractional paths
> CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
> CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO
> ('1000');
> CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO
> ('2000');
>
> -- insert data
> INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
> ANALYZE fract_t;
>
> -- verify plan; nested index only scans
> SET max_parallel_workers_per_gather = 0;
> SET enable_partitionwise_join = on;
>
> the testsuite was expecting the below with enable_partitionwise_join = on;
>
> EXPLAIN (COSTS OFF)
> SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER
> BY x.id ASC LIMIT 10;
> QUERY PLAN
> -----------------------------------------------------------------------
> Limit
> -> Merge Append
> Sort Key: x.id
> -> Merge Left Join
> Merge Cond: (x_1.id = y_1.id)
> -> Index Only Scan using fract_t0_pkey on fract_t0 x_1
> -> Index Only Scan using fract_t0_pkey on fract_t0 y_1
> -> Merge Left Join
> Merge Cond: (x_2.id = y_2.id)
> -> Index Only Scan using fract_t1_pkey on fract_t1 x_2
> -> Index Only Scan using fract_t1_pkey on fract_t1 y_2
>
> but actually with patch it gets this (here with costs):
>
> EXPLAIN (COSTS) SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
> USING (id) ORDER BY x.id ASC LIMIT 10;
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------
> Limit (cost=1.10..2.21 rows=10 width=16)
> -> Merge Left Join (cost=1.10..223.10 rows=2000 width=16)
> Merge Cond: (x.id = y.id)
> -> Append (cost=0.55..96.55 rows=2000 width=8)
> [..]
> -> Append (cost=0.55..96.55 rows=2000 width=8)
> [..]
>
>
> if you run it without patch and again with enable_partitionwise_join=on:
>
> EXPLAIN SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING
> (id) ORDER BY x.id ASC LIMIT 10;
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------
> Limit (cost=1.11..2.22 rows=10 width=16)
> -> Merge Append (cost=1.11..223.11 rows=2000 width=16)
> Sort Key: x.id
> -> Merge Left Join (cost=0.55..101.55 rows=1000 width=16)
> [..]
> -> Merge Left Join (cost=0.55..101.55 rows=1000 width=16)
> [..]
>
> So with the patch that SQL does not use partitionwise join as it finds
> it more optimal to stick to a plan with cost of "1.10..2.21" instead
> of "1.11..2.22" (w/ partition_join), nitpicking but still a failure
> technically. Perhaps it could be even removed? (it's pretty close to
> noise?).
>

I think we need to replace the failing query with something which uses
partitionwise join even with the patch.

I will take a look at this after returning from a two week long vacation,
unless someone else is interested in fixing this before that.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-05-06 13:22:41 Re: partitioning and identity column
Previous Message David G. Johnston 2024-05-06 12:55:45 Re: Increase the length of identifers from 63 characters to 128 characters or more