Re: Problem with default partition pruning

From: Thibaut <thibaut(dot)madelaine(at)dalibo(dot)com>
To: "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, 'Yuzuko Hosoya' <hosoya(dot)yuzuko(at)lab(dot)ntt(dot)co(dot)jp>
Cc: 'Amit Langote' <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, 'PostgreSQL Hackers' <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Problem with default partition pruning
Date: 2019-03-14 14:10:53
Message-ID: a4968068-6401-7a9c-8bd4-6a3bc9164a86@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Le 28/02/2019 à 09:26, Imai, Yoshikazu a écrit :
> Hosoya-san
>
> On Wed, Feb 27, 2019 at 6:51 AM, Yuzuko Hosoya wrote:
>>> From: Amit Langote [mailto:Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp]
>>> Sent: Wednesday, February 27, 2019 11:22 AM
>>>
>>> Hosoya-san,
>>>
>>> On 2019/02/22 17:14, Yuzuko Hosoya wrote:
>>>> Hi,
>>>>
>>>> I found the bug of default partition pruning when executing a range
>> query.
>>>> -----
>>>> postgres=# create table test1(id int, val text) partition by range
>>>> (id); postgres=# create table test1_1 partition of test1 for values
>>>> from (0) to (100); postgres=# create table test1_2 partition of
>>>> test1 for values from (150) to (200); postgres=# create table
>>>> test1_def partition of test1 default;
>>>>
>>>> postgres=# explain select * from test1 where id > 0 and id < 30;
>>>> QUERY PLAN
>>>> ----------------------------------------------------------------
>>>> Append (cost=0.00..11.83 rows=59 width=11)
>>>> -> Seq Scan on test1_1 (cost=0.00..5.00 rows=58 width=11)
>>>> Filter: ((id > 0) AND (id < 30))
>>>> -> Seq Scan on test1_def (cost=0.00..6.53 rows=1 width=12)
>>>> Filter: ((id > 0) AND (id < 30))
>>>> (5 rows)
>>>>
>>>> There is no need to scan the default partition, but it's scanned.
>>>> -----
>>>>
>>>> In the current implement, whether the default partition is scanned
>>>> or not is determined according to each condition of given WHERE
>>>> clause at get_matching_range_bounds(). In this example,
>>>> scan_default is set true according to id > 0 because id >= 200
>>>> matches the default partition. Similarly, according to id < 30,
>> scan_default is set true.
>>>> Then, these results are combined according to AND/OR at
>> perform_pruning_combine_step().
>>>> In this case, final result's scan_default is set true.
>>>>
>>>> The modifications I made are as follows:
>>>> - get_matching_range_bounds() determines only offsets of range bounds
>>>> according to each condition
>>>> - These results are combined at perform_pruning_combine_step()
>>>> - Whether the default partition is scanned or not is determined at
>>>> get_matching_partitions()
>>>>
>>>> Attached the patch. Any feedback is greatly appreciated.
>>> Thank you for reporting. Can you please add this to March CF in Bugs
>>> category so as not to lose
>> track
>>> of this?
>>>
>>> I will try to send review comments soon.
>>>
>> Thank you for your reply. I added this to March CF.
> I tested with simple use case and I confirmed it works correctly like below.
>
> In case using between clause:
> postgres=# create table test1(id int, val text) partition by range (id);
> postgres=# create table test1_1 partition of test1 for values from (0) to (100);
> postgres=# create table test1_2 partition of test1 for values from (150) to (200);
> postgres=# create table test1_def partition of test1 default;
>
> [HEAD]
> postgres=# explain analyze select * from test1 where id between 0 and 50;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------
> Append (cost=0.00..58.16 rows=12 width=36) (actual time=0.008..0.008 rows=0 loops=1)
> -> Seq Scan on test1_1 (cost=0.00..29.05 rows=6 width=36) (actual time=0.005..0.005 rows=0 loops=1)
> Filter: ((id >= 0) AND (id <= 50))
> -> Seq Scan on test1_def (cost=0.00..29.05 rows=6 width=36) (actual time=0.002..0.002 rows=0 loops=1)
> Filter: ((id >= 0) AND (id <= 50))
>
>
> [patched]
> postgres=# explain analyze select * from test1 where id between 0 and 50;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------
> Append (cost=0.00..29.08 rows=6 width=36) (actual time=0.006..0.006 rows=0 loops=1)
> -> Seq Scan on test1_1 (cost=0.00..29.05 rows=6 width=36) (actual time=0.004..0.005 rows=0 loops=1)
> Filter: ((id >= 0) AND (id <= 50))
>
>
>
> I considered about another use case. If default partition contains rows whose id = 300 and then we add another partition which have constraints like id >= 300 and id < 400, I thought we won't scan the rows anymore. But I noticed we simply can't add such a partition.
>
> postgres=# insert into test1 values (300);
> INSERT 0 1
> postgres=# create table test1_3 partition of test1 for values from (300) to (400);
> ERROR: updated partition constraint for default partition "test1_def" would be violated by some row
>
>
> So I haven't come up with bad cases so far :)
>
> --
> Yoshikazu Imai

Hello Yoshikazu-San,

I tested your patch using some sub-partitions and found a possible problem.

I create a new partitioned partition test1_3 with 2 sub-partitions :

-------------------------

create table test1_3 partition of test1 for values from (200) to (400)
partition by range (id);
create table test1_3_1 partition of test1_3 for values from (200) to (250);
create table test1_3_2 partition of test1_3 for values from (250) to (350);

# explain select * from test1 where (id > 0 and id < 30);
                          QUERY PLAN                          
---------------------------------------------------------------
 Append  (cost=0.00..29.08 rows=6 width=36)
   ->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36)
         Filter: ((id > 0) AND (id < 30))
(3 rows)

# explain select * from test1 where (id > 220 and id < 230);
                           QUERY PLAN                           
-----------------------------------------------------------------
 Append  (cost=0.00..29.08 rows=6 width=36)
   ->  Seq Scan on test1_3_1  (cost=0.00..29.05 rows=6 width=36)
         Filter: ((id > 220) AND (id < 230))
(3 rows)

# explain select * from test1
where (id > 0 and id < 30) or (id > 220 and id < 230);
                                QUERY PLAN                                
---------------------------------------------------------------------------
 Append  (cost=0.00..106.40 rows=39 width=36)
   ->  Seq Scan on test1_1  (cost=0.00..35.40 rows=13 width=36)
         Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
   ->  Seq Scan on test1_3_1  (cost=0.00..35.40 rows=13 width=36)
         Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
   ->  Seq Scan on test1_3_2  (cost=0.00..35.40 rows=13 width=36)
         Filter: (((id > 0) AND (id < 30)) OR ((id > 220) AND (id < 230)))
(7 rows)

-----------------

Partition pruning is functioning when only the sub-partition is
required. When both the partition and the sub-partition is required,
there is no pruning on the sub-partition.

Cordialement,

--
Thibaut Madelaine
Dalibo

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-03-14 14:16:41 Re: WIP: Avoid creation of the free space map for small tables
Previous Message Robert Haas 2019-03-14 14:03:34 Re: Timeout parameters