RE: Problem with default partition pruning

From: "Yuzuko Hosoya" <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp>
To: "'Thibaut'" <thibaut(dot)madelaine(at)dalibo(dot)com>, "'Imai, Yoshikazu'" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>
Cc: "'Amit Langote'" <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, "'PostgreSQL Hackers'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Problem with default partition pruning
Date: 2019-03-15 06:05:41
Message-ID: 001901d4daf5$1ef4f640$5cdee2c0$@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Thibaut,

Thanks a lot for your test and comments.

>
> Le 28/02/2019 à 09:26, Imai, Yoshikazu a écrit :
> > Hosoya-san
> >
> > On Wed, Feb 27, 2019 at 6:51 AM, Yuzuko Hosoya wrote:
> >>> From: Amit Langote [mailto:Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp]
> >>> Sent: Wednesday, February 27, 2019 11:22 AM
> >>>
> >>> Hosoya-san,
> >>>
> >>> On 2019/02/22 17:14, Yuzuko Hosoya wrote:
> >>>> Hi,
> >>>>
> >>>> I found the bug of default partition pruning when executing a range
> >> query.
> >>>> -----
> >>>> postgres=# create table test1(id int, val text) partition by range
> >>>> (id); postgres=# create table test1_1 partition of test1 for values
> >>>> from (0) to (100); postgres=# create table test1_2 partition of
> >>>> test1 for values from (150) to (200); postgres=# create table
> >>>> test1_def partition of test1 default;
> >>>>
> >>>> postgres=# explain select * from test1 where id > 0 and id < 30;
> >>>> QUERY PLAN
> >>>> ----------------------------------------------------------------
> >>>> Append (cost=0.00..11.83 rows=59 width=11)
> >>>> -> Seq Scan on test1_1 (cost=0.00..5.00 rows=58 width=11)
> >>>> Filter: ((id > 0) AND (id < 30))
> >>>> -> Seq Scan on test1_def (cost=0.00..6.53 rows=1 width=12)
> >>>> Filter: ((id > 0) AND (id < 30))
> >>>> (5 rows)
> >>>>
> >>>> There is no need to scan the default partition, but it's scanned.
> >>>> -----
> >>>>
> >>>> In the current implement, whether the default partition is scanned
> >>>> or not is determined according to each condition of given WHERE
> >>>> clause at get_matching_range_bounds(). In this example,
> >>>> scan_default is set true according to id > 0 because id >= 200
> >>>> matches the default partition. Similarly, according to id < 30,
> >> scan_default is set true.
> >>>> Then, these results are combined according to AND/OR at
> >> perform_pruning_combine_step().
> >>>> In this case, final result's scan_default is set true.
> >>>>
> >>>> The modifications I made are as follows:
> >>>> - get_matching_range_bounds() determines only offsets of range bounds
> >>>> according to each condition
> >>>> - These results are combined at perform_pruning_combine_step()
> >>>> - Whether the default partition is scanned or not is determined at
> >>>> get_matching_partitions()
> >>>>
> >>>> Attached the patch. Any feedback is greatly appreciated.
> >>> Thank you for reporting. Can you please add this to March CF in
> >>> Bugs category so as not to lose
> >> track
> >>> of this?
> >>>
> >>> I will try to send review comments soon.
> >>>
> >> Thank you for your reply. I added this to March CF.
> > I tested with simple use case and I confirmed it works correctly like below.
> >
> > In case using between clause:
> > postgres=# create table test1(id int, val text) partition by range
> > (id); postgres=# create table test1_1 partition of test1 for values
> > from (0) to (100); postgres=# create table test1_2 partition of test1
> > for values from (150) to (200); postgres=# create table test1_def
> > partition of test1 default;
> >
> > [HEAD]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > -------------------------------------
> > Append (cost=0.00..58.16 rows=12 width=36) (actual time=0.008..0.008 rows=0 loops=1)
> > -> Seq Scan on test1_1 (cost=0.00..29.05 rows=6 width=36) (actual time=0.005..0.005
> rows=0 loops=1)
> > Filter: ((id >= 0) AND (id <= 50))
> > -> Seq Scan on test1_def (cost=0.00..29.05 rows=6 width=36) (actual
> time=0.002..0.002 rows=0 loops=1)
> > Filter: ((id >= 0) AND (id <= 50))
> >
> >
> > [patched]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > -----------------------------------
> > Append (cost=0.00..29.08 rows=6 width=36) (actual time=0.006..0.006 rows=0 loops=1)
> > -> Seq Scan on test1_1 (cost=0.00..29.05 rows=6 width=36) (actual time=0.004..0.005
> rows=0 loops=1)
> > Filter: ((id >= 0) AND (id <= 50))
> >
> >
> >
> > I considered about another use case. If default partition contains rows whose id = 300
> and then we add another partition which have constraints like id >= 300 and id < 400, I thought
> we won't scan the rows anymore. But I noticed we simply can't add such a partition.
> >
> > postgres=# insert into test1 values (300); INSERT 0 1 postgres=#
> > create table test1_3 partition of test1 for values from (300) to
> > (400);
> > ERROR: updated partition constraint for default partition "test1_def"
> > would be violated by some row
> >
> >
> > So I haven't come up with bad cases so far :)
> >
> > --
> > Yoshikazu Imai
>
> Hello Yoshikazu-San,
>
> I tested your patch using some sub-partitions and found a possible problem.
>
> I create a new partitioned partition test1_3 with 2 sub-partitions :
>
> -------------------------
>
> create table test1_3 partition of test1 for values from (200) to (400) partition by range
> (id); create table test1_3_1 partition of test1_3 for values from (200) to (250); create
> table test1_3_2 partition of test1_3 for values from (250) to (350);
>
> # explain select * from test1 where (id > 0 and id < 30);
> QUERY PLAN
> ---------------------------------------------------------------
> Append (cost=0.00..29.08 rows=6 width=36)
> -> Seq Scan on test1_1 (cost=0.00..29.05 rows=6 width=36)
> Filter: ((id > 0) AND (id < 30))
> (3 rows)
>
> # explain select * from test1 where (id > 220 and id < 230);
> QUERY PLAN
> -----------------------------------------------------------------
> Append (cost=0.00..29.08 rows=6 width=36)
> -> Seq Scan on test1_3_1 (cost=0.00..29.05 rows=6 width=36)
> Filter: ((id > 220) AND (id < 230))
> (3 rows)
>
> # explain select * from test1
> where (id > 0 and id < 30) or (id > 220 and id < 230);
> QUERY PLAN
> ---------------------------------------------------------------------------
> Append (cost=0.00..106.40 rows=39 width=36)
> -> Seq Scan on test1_1 (cost=0.00..35.40 rows=13 width=36)
> Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
> -> Seq Scan on test1_3_1 (cost=0.00..35.40 rows=13 width=36)
> Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
> -> Seq Scan on test1_3_2 (cost=0.00..35.40 rows=13 width=36)
> Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
> (7 rows)
>
> -----------------
>
> Partition pruning is functioning when only the sub-partition is required. When both the
> partition and the sub-partition is required, there is no pruning on the sub-partition.
>
Indeed, it's problematic. I also did test and I found that
this problem was occurred when any partition didn't match
WHERE clauses. So following query didn't work correctly.

# explain select * from test1_3 where (id > 0 and id < 30);
QUERY PLAN
-----------------------------------------------------------------
Append (cost=0.00..58.16 rows=12 width=36)
-> Seq Scan on test1_3_1 (cost=0.00..29.05 rows=6 width=36)
Filter: ((id > 0) AND (id < 30))
-> Seq Scan on test1_3_2 (cost=0.00..29.05 rows=6 width=36)
Filter: ((id > 0) AND (id < 30))
(5 rows)

I created a new patch to handle this problem, and confirmed
the query you mentioned works as expected

# explain select * from test1 where (id > 0 and id < 30) or (id > 220 and id < 230);
QUERY PLAN
---------------------------------------------------------------------------
Append (cost=0.00..70.93 rows=26 width=36)
-> Seq Scan on test1_1_1 (cost=0.00..35.40 rows=13 width=36)
Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
-> Seq Scan on test1_3_1 (cost=0.00..35.40 rows=13 width=36)
Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
(5 rows)

v2 patch attached.
Could you please check it again?

--
Best regards,
Yuzuko Hosoya

Attachment Content-Type Size
v2_default_partition_pruning.patch application/octet-stream 12.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Imai, Yoshikazu 2019-03-15 06:20:10 RE: proposal: pg_restore --convert-to-text
Previous Message Michael Paquier 2019-03-15 05:50:04 Re: Fix typo in test code comments