Re: BUG #15779: Partition elimination doesn't work as expected when using PARTITION BY RANGE

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #15779: Partition elimination doesn't work as expected when using PARTITION BY RANGE
Date: 2019-04-26 06:59:27
Message-ID: 358cd54d-c018-60f8-7d76-55780eef6678@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2019/04/25 20:33, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 15779
> Logged by: Damir Ciganović-Janković
> Email address: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com
> PostgreSQL version: 11.2
> Operating system: Windows, Centos 6/7
> Description:
>
> We have an issue that our queries are slow due to bad partition elimination
> when executing queries for our partitioned tables, here is an example, I
> will put all explain query outputs later for each explain separately:
>
> ------------
>
> DROP TABLE IF EXISTS test;
> DROP TYPE IF EXISTS test_enum;
>
> CREATE TYPE test_enum AS ENUM ('FIRST', 'SECOND');
> CREATE TABLE test (enum_col test_enum, timestamp_col timestamp, counter int)
> PARTITION BY RANGE (enum_col, timestamp_col);
>
> CREATE TABLE test_FIRST_1 PARTITION OF test FOR VALUES FROM ('FIRST',
> '2019-01-01') TO ('FIRST', '2019-01-02');
> CREATE TABLE test_FIRST_2 PARTITION OF test FOR VALUES FROM ('FIRST',
> '2019-01-02') TO ('FIRST', '2019-01-03');
>
> CREATE TABLE test_SECOND_1 PARTITION OF test FOR VALUES FROM ('SECOND',
> '2019-01-01') TO ('SECOND', '2019-01-02');
> CREATE TABLE test_SECOND_2 PARTITION OF test FOR VALUES FROM ('SECOND',
> '2019-01-02') TO ('SECOND', '2019-01-03');
>
> EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01';
> EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
> enum_col = 'FIRST';
> EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
> enum_col IN ('FIRST', 'SECOND');
>
> ------------
>
> This is what we get for first explain:
> EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01';
>
> Append (cost=0.00..132.68 rows=36 width=16) (actual time=0.044..0.044
> rows=0 loops=1)
> -> Seq Scan on test_first_1 (cost=0.00..33.13 rows=9 width=16) (actual
> time=0.017..0.017 rows=0 loops=1)
> Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
> time zone)
> -> Seq Scan on test_first_2 (cost=0.00..33.13 rows=9 width=16) (actual
> time=0.010..0.010 rows=0 loops=1)
> Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
> time zone)
> -> Seq Scan on test_second_1 (cost=0.00..33.13 rows=9 width=16) (actual
> time=0.008..0.008 rows=0 loops=1)
> Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
> time zone)
> -> Seq Scan on test_second_2 (cost=0.00..33.13 rows=9 width=16) (actual
> time=0.007..0.007 rows=0 loops=1)
> Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without
> time zone)
>
> As we can see, we got all partitions despite setting timestamp. When we add
> an enum value, partition pruning is fine:
> EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
> enum_col = 'FIRST';
>
> Append (cost=0.00..37.76 rows=1 width=16) (actual time=0.015..0.015 rows=0
> loops=1)
> -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
> time=0.014..0.014 rows=0 loops=1)
> Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without
> time zone) AND (enum_col = 'FIRST'::test_enum))

Maybe we forgot to add a note to the documentation on partition pruning,
but in the case of multi-column partition key as in this case, for pruning
to work using a given column from the key, you must have specified all
earlier columns. In this case, since enum_col appears earlier in the
partition key of your example table, it has to be mentioned for pruning
with timestamp_col to work. So, it's working as designed, although it may
be possible to improve it.

> Another bug here is when we put both enum values in IN clause, we still go
> through all partitions despite timestamp targeting only _1 partitions:
> EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and
> enum_col IN ('FIRST', 'SECOND');
>
> Append (cost=0.00..151.02 rows=4 width=16) (actual time=0.083..0.083
> rows=0 loops=1)
> -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual
> time=0.045..0.045 rows=0 loops=1)
> Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
> (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
> -> Seq Scan on test_first_2 (cost=0.00..37.75 rows=1 width=16) (actual
> time=0.012..0.012 rows=0 loops=1)
> Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
> (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
> -> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual
> time=0.012..0.012 rows=0 loops=1)
> Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
> (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
> -> Seq Scan on test_second_2 (cost=0.00..37.75 rows=1 width=16) (actual
> time=0.011..0.011 rows=0 loops=1)
> Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND
> (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))

Here you're running into another limitation of the current design of
pruning logic. Let me try to explain the logic and the limitation that
causes pruning to not work as desired in this case. The pruning logic
looks at a given WHERE clause and forms a tuple of values extracted from
the various sub-clauses combined with AND. So, if you have WHERE enum_col
= 'FIRST' AND timestamp_col = '2019-01-01', you'll end up with a tuple
('FIRST', '2019-01-01'). The tuple is then compared against partition
bounds and those whose bounds allow that tuple are selected. What the
logic is unable to handle currently is a sub-clause that matches multiple
values of a column, that is a clause such as enum_col IN ('FIRST',
'SECOND') in which enum_col can be either 'FIRST' or 'SECOND'.

For that case, it would be possible to form two tuples to be looked up:
('FIRST', '20190-01-01') and ('SECOND', '2019-01-01'). They would match
exactly two desired partitions and that would the desired result.

Instead what happens now is it prunes using each sub-clause in isolation.
So, for enum_col IN ('FIRST', 'SECOND'), it will form two tuples
('FIRST'), ('SECOND'), match the first 2 partitions with tuple ('FIRST'),
and finally the other 2 with tuple ('SECOND'); result -- all partitions
are selected. The other sub-clause timestamp_col = '2019-01-01' will fail
to prune in isolation, as described above for your first example query.

> We could fix this by adding check constraints. But I think that it should
> work without adding constraints, that's why it seems like a bug to me.
>
> BONUS NOTE: In Postgres version 10.2 (we migrated from 10.2 to 11.2),
> partition pruning were better in cases where we used no enum_col or multiple
> enum_col values in where clause:

Pruning in 10.2 works using internally generated partition constraints
(which for this purpose are same as CHECK constraints). With the new
pruning logic introduced in 11, planner no longer considers partition
constraint because it's redundant to check them in most cases, because
pruning would've selected the right set of partitions. Given that the new
pruning logic is still unable to handle the cases like above, maybe we
could change the planner to consider them, at least until we fix the
pruning logic to handle such cases.

Thanks,
Amit

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Damir Ciganović-Janković 2019-04-26 08:09:37 Re: BUG #15779: Partition elimination doesn't work as expected when using PARTITION BY RANGE
Previous Message Amit Langote 2019-04-26 03:52:40 Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table