Re: [HACKERS] advanced partition matching algorithm for partition-wise join

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: amul sul <sulamul(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Date: 2020-02-05 12:51:13
Message-ID: CAPmGK14gTKUL4Enq-sYX1dVkySOK7JVPuEfUDcrr5P+3NHVQGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 29, 2020 at 9:15 PM Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> wrote:
> On Tue, Jan 28, 2020 at 1:39 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > I have added tests checking correctness and showing some partition pruning limitations. Find three patches, attached.
> >
> > The v31-0001-… patch merely applies your patches as a starting point for the next two patches. It is your work, not mine.
> >
> > The v31-0002-… patch adds more regression tests for range partitioning. The commit message contains my comments about that.
> >
> > The v31-0003-… patch adds more regression tests for list partitioning, and again, the commit message contains my comments about that.
>
> I'll dig into it more closely.

I tested the new test patches. The patches are applied to the
partition_join.sql regression test cleanly, but the new tests failed
in my environment (even with make check LANG=C). I think I should set
the right locale for the testing. Which one did you use? Maybe I'm
missing something, but let me comment on the new tests. This is the
one proposed in the v31-0002 patch:

+EXPLAIN (COSTS OFF)
+SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a)
WHERE t1.a IN ('äbç', 'ὀδυσσεύς');
+ QUERY PLAN
+------------------------------------------------------------------
+ Hash Join
+ Hash Cond: (t2.a = t1.a)
+ -> Append
+ -> Seq Scan on beta_a t2_1
+ -> Seq Scan on beta_b t2_2
+ -> Seq Scan on beta_c t2_3
+ -> Seq Scan on beta_d t2_4
+ -> Seq Scan on beta_e t2_5
+ -> Seq Scan on beta_f t2_6
+ -> Seq Scan on beta_g t2_7
+ -> Seq Scan on beta_h t2_8
+ -> Seq Scan on beta_default t2_9
+ -> Hash
+ -> Append
+ -> Seq Scan on alpha_e t1_1
+ Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
+ -> Seq Scan on alpha_default t1_2
+ Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
+(18 rows)

The commit message says:

When joining with

alpha.a = beta.a and alpha.a IN ('äbç', 'ὀδυσσεύς')

the planner does the right thing for one side of the query, but
hits all partitions for the other side, which it doesn't need to
do.

Yeah, I agree that the resulting plan is not efficient. The reason
for that would be that the planner doesn't generate a qual on the
outer side matching the ScalarArrayOpExpr qual "a = ANY
('{äbç,ὀδυσσεύς}'::text[])" on the inner side, which I think would be
a restriction caused by the equivalence machinery not by the
partitionwise join logic IIUC. I think the critique would be useful,
so I don't object to adding this test case, but the critique would be
more about query planning that is actually not related to the
partitionwise join logic, so I'm not sure that the partition_join.sql
regression test is the best place to add it. I guess that there would
be much better places than partition_join.sql. (This is nitpicking;
but another thing I noticed about this test case is that the join
query contains only a single join condition "t1.a = t2.a", but the
queried tables alpha and beta are range-partitioned by multiple
columns a and b, so the query should have a join condition for each of
the columns like "t1.a = t2.a AND t1.b = t2.b" if adding this as a
test case for partitionwise join.) I feel almost the same to other
test cases in the patch (and the v31-0003 patch), except this one
proposed in the v31-0003 patch:

+CREATE TABLE alpha (a TEXT) PARTITION BY LIST(a);
+CREATE TABLE alpha_a PARTITION OF alpha FOR VALUES IN ('Turkiye', 'TURKIYE');
+CREATE TABLE alpha_b PARTITION OF alpha FOR VALUES IN ('b?t', 'BIT');
+CREATE TABLE alpha_c PARTITION OF alpha FOR VALUES IN ('abc', 'ABC');
+CREATE TABLE alpha_d PARTITION OF alpha FOR VALUES IN ('aaa', 'cote', 'Gotz');
+CREATE TABLE alpha_e PARTITION OF alpha FOR VALUES IN ('?δυσσε??', '?ΔΥΣΣΕ?Σ');
+CREATE TABLE alpha_f PARTITION OF alpha FOR VALUES IN ('を読み取り用',
'にオープンできませんでした', NULL);
+CREATE TABLE alpha_default PARTITION OF alpha DEFAULT;
+CREATE TABLE beta (a TEXT) PARTITION BY LIST(a);
+CREATE TABLE beta_a PARTITION OF beta FOR VALUES IN ('Turkiye',
'cote', '?ΔΥΣΣΕ?Σ');
+CREATE TABLE beta_b PARTITION OF beta FOR VALUES IN ('b?t', 'TURKIYE');
+CREATE TABLE beta_c PARTITION OF beta FOR VALUES IN ('abc', 'を読み取り用',
'にオープンできませんでした');
+CREATE TABLE beta_d PARTITION OF beta FOR VALUES IN ('aaa', 'Gotz',
'BIT', '?δυσσε??', 'ABC', NULL);
+CREATE TABLE beta_default PARTITION OF beta DEFAULT;

+EXPLAIN (COSTS OFF)
+SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a)
WHERE t1.a IS NULL;
+ QUERY PLAN
+-------------------------------------------
+ Hash Join
+ Hash Cond: (t2.a = t1.a)
+ -> Append
+ -> Seq Scan on beta_d t2_1
+ -> Seq Scan on beta_b t2_2
+ -> Seq Scan on beta_a t2_3
+ -> Seq Scan on beta_c t2_4
+ -> Seq Scan on beta_default t2_5
+ -> Hash
+ -> Seq Scan on alpha_f t1
+ Filter: (a IS NULL)
+(11 rows)

Which made me notice an issue in the partitionwise join logic:
partition_bounds_merge() assumes that all partitions of given
relations are always present; in other words, it doesn't consider
cases where some of the partitions have been pruned entirely. :-( If
that function considered the cases, the above query would use
partitionwise join, because alpha_f only matches beta_c. I don't
think this is a bug, but it causes the planner not only to fail to
chose partitionwise join but to waste CPU cycles to process pruned
partitions, so I'll propose to address it. Attached is a patch for
that (the v31-0004 patch) created on top of the main patch (the
v31-0001 patch), which is also attached. With the attached, the plan
for the above query would be changed to something like this:

EXPLAIN (COSTS OFF)
SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a =
t2.a) WHERE t1.a IS NULL;
QUERY PLAN
------------------------------
Nested Loop
Join Filter: (t1.a = t2.a)
-> Seq Scan on alpha_f t1
Filter: (a IS NULL)
-> Seq Scan on beta_c t2
(5 rows)

Thanks again, Mark!

Best regards,
Etsuro Fujita

Attachment Content-Type Size
v31-0001-Applying-Etsuro-Fujita-s-patches.patch application/octet-stream 182.3 KB
v31-0004-Consider-pruned-partitions.patch application/octet-stream 25.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rémi Lapeyre 2020-02-05 13:18:59 Re: [PATCH v1] Allow COPY "text" format to output a header
Previous Message Emil Iggland 2020-02-05 11:46:33 Re: BUG #15858: could not stat file - over 4GB