Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: amul sul <sulamul(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2020-02-05 17:03:43
Message-ID: 48CB1153-65B0-4BC9-843D-C8935A2CA44C@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Feb 5, 2020, at 4:51 AM, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>
> On Wed, Jan 29, 2020 at 9:15 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>> On Tue, Jan 28, 2020 at 1:39 PM Mark Dilger
>> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>>> I have added tests checking correctness and showing some partition pruning limitations. Find three patches, attached.
>>>
>>> The v31-0001-… patch merely applies your patches as a starting point for the next two patches. It is your work, not mine.
>>>
>>> The v31-0002-… patch adds more regression tests for range partitioning. The commit message contains my comments about that.
>>>
>>> The v31-0003-… patch adds more regression tests for list partitioning, and again, the commit message contains my comments about that.
>>
>> I'll dig into it more closely.
>
> I tested the new test patches. The patches are applied to the
> partition_join.sql regression test cleanly, but the new tests failed
> in my environment (even with make check LANG=C). I think I should set
> the right locale for the testing. Which one did you use?

I did not set a locale in the tests. My environment has:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

> Maybe I'm
> missing something, but let me comment on the new tests. This is the
> one proposed in the v31-0002 patch:
>
> +EXPLAIN (COSTS OFF)
> +SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a)
> WHERE t1.a IN ('äbç', 'ὀδυσσεύς');
> + QUERY PLAN
> +------------------------------------------------------------------
> + Hash Join
> + Hash Cond: (t2.a = t1.a)
> + -> Append
> + -> Seq Scan on beta_a t2_1
> + -> Seq Scan on beta_b t2_2
> + -> Seq Scan on beta_c t2_3
> + -> Seq Scan on beta_d t2_4
> + -> Seq Scan on beta_e t2_5
> + -> Seq Scan on beta_f t2_6
> + -> Seq Scan on beta_g t2_7
> + -> Seq Scan on beta_h t2_8
> + -> Seq Scan on beta_default t2_9
> + -> Hash
> + -> Append
> + -> Seq Scan on alpha_e t1_1
> + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> + -> Seq Scan on alpha_default t1_2
> + Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> +(18 rows)
>
> The commit message says:
>
> When joining with
>
> alpha.a = beta.a and alpha.a IN ('äbç', 'ὀδυσσεύς')
>
> the planner does the right thing for one side of the query, but
> hits all partitions for the other side, which it doesn't need to
> do.
>
> Yeah, I agree that the resulting plan is not efficient. The reason
> for that would be that the planner doesn't generate a qual on the
> outer side matching the ScalarArrayOpExpr qual "a = ANY
> ('{äbç,ὀδυσσεύς}'::text[])" on the inner side, which I think would be
> a restriction caused by the equivalence machinery not by the
> partitionwise join logic IIUC.

It’s fine if this is beyond the scope of the patch.

> I think the critique would be useful,
> so I don't object to adding this test case, but the critique would be
> more about query planning that is actually not related to the
> partitionwise join logic, so I'm not sure that the partition_join.sql
> regression test is the best place to add it. I guess that there would
> be much better places than partition_join.sql.

You don’t need to add the test anywhere. It’s enough for me that you looked at it and considered whether the partition-wise join patch should do anything differently in this case. Again, it sounds like this is beyond the scope of the patch.

> (This is nitpicking;
> but another thing I noticed about this test case is that the join
> query contains only a single join condition "t1.a = t2.a", but the
> queried tables alpha and beta are range-partitioned by multiple
> columns a and b, so the query should have a join condition for each of
> the columns like "t1.a = t2.a AND t1.b = t2.b" if adding this as a
> test case for partitionwise join.)

Well, it is important that partition-wise join does not break such queries. I added the column ‘b’ to the partitioning logic to verify that did not confuse the code in your patch.

> I feel almost the same to other
> test cases in the patch (and the v31-0003 patch), except this one
> proposed in the v31-0003 patch:
>
> +CREATE TABLE alpha (a TEXT) PARTITION BY LIST(a);
> +CREATE TABLE alpha_a PARTITION OF alpha FOR VALUES IN ('Turkiye', 'TURKIYE');
> +CREATE TABLE alpha_b PARTITION OF alpha FOR VALUES IN ('b?t', 'BIT');
> +CREATE TABLE alpha_c PARTITION OF alpha FOR VALUES IN ('abc', 'ABC');
> +CREATE TABLE alpha_d PARTITION OF alpha FOR VALUES IN ('aaa', 'cote', 'Gotz');
> +CREATE TABLE alpha_e PARTITION OF alpha FOR VALUES IN ('?δυσσε??', '?ΔΥΣΣΕ?Σ');
> +CREATE TABLE alpha_f PARTITION OF alpha FOR VALUES IN ('を読み取り用',
> 'にオープンできませんでした', NULL);
> +CREATE TABLE alpha_default PARTITION OF alpha DEFAULT;
> +CREATE TABLE beta (a TEXT) PARTITION BY LIST(a);
> +CREATE TABLE beta_a PARTITION OF beta FOR VALUES IN ('Turkiye',
> 'cote', '?ΔΥΣΣΕ?Σ');
> +CREATE TABLE beta_b PARTITION OF beta FOR VALUES IN ('b?t', 'TURKIYE');
> +CREATE TABLE beta_c PARTITION OF beta FOR VALUES IN ('abc', 'を読み取り用',
> 'にオープンできませんでした');
> +CREATE TABLE beta_d PARTITION OF beta FOR VALUES IN ('aaa', 'Gotz',
> 'BIT', '?δυσσε??', 'ABC', NULL);
> +CREATE TABLE beta_default PARTITION OF beta DEFAULT;
>
> +EXPLAIN (COSTS OFF)
> +SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a)
> WHERE t1.a IS NULL;
> + QUERY PLAN
> +-------------------------------------------
> + Hash Join
> + Hash Cond: (t2.a = t1.a)
> + -> Append
> + -> Seq Scan on beta_d t2_1
> + -> Seq Scan on beta_b t2_2
> + -> Seq Scan on beta_a t2_3
> + -> Seq Scan on beta_c t2_4
> + -> Seq Scan on beta_default t2_5
> + -> Hash
> + -> Seq Scan on alpha_f t1
> + Filter: (a IS NULL)
> +(11 rows)
>
> Which made me notice an issue in the partitionwise join logic:
> partition_bounds_merge() assumes that all partitions of given
> relations are always present; in other words, it doesn't consider
> cases where some of the partitions have been pruned entirely. :-( If
> that function considered the cases, the above query would use
> partitionwise join, because alpha_f only matches beta_c. I don't
> think this is a bug, but it causes the planner not only to fail to
> chose partitionwise join but to waste CPU cycles to process pruned
> partitions, so I'll propose to address it. Attached is a patch for
> that (the v31-0004 patch) created on top of the main patch (the
> v31-0001 patch), which is also attached. With the attached, the plan
> for the above query would be changed to something like this:
>
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a =
> t2.a) WHERE t1.a IS NULL;
> QUERY PLAN
> ------------------------------
> Nested Loop
> Join Filter: (t1.a = t2.a)
> -> Seq Scan on alpha_f t1
> Filter: (a IS NULL)
> -> Seq Scan on beta_c t2
> (5 rows)
>
> Thanks again, Mark!

Thank you for working on this important patch!


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-02-05 17:15:43 Re: Is custom MemoryContext prohibited?
Previous Message Robert Haas 2020-02-05 16:48:05 Re: Is custom MemoryContext prohibited?