BUG #15933: Partition by multiple columns bug

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com
Subject: BUG #15933: Partition by multiple columns bug
Date: 2019-07-30 07:53:20
Message-ID: 15933-e30a9940d35e6fcf@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15933
Logged by: Damir Ciganović-Janković
Email address: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com
PostgreSQL version: 11.2
Operating system: Windows, Centos
Description:

Hello, I have a problem that my query is not hitting any records when
querying multicolumn partitioned table, but it seems to me that it should.
Here is the code which will be surrounded by multiple # signs:

######################################################

DROP TABLE IF EXISTS my_table;
DROP TYPE IF EXISTS enum1;

-- enum with two values
CREATE TYPE enum1 AS ENUM ( 'FIRST', 'SECOND' );

-- table with enum1, timestamp and intger (will be referenced later)
CREATE TABLE my_table (
my_enum enum1 NOT NULL,
my_time timestamp NOT NULL,
my_integer integer NOT NULL
) PARTITION BY RANGE (my_enum, my_time);

-- two partitions, large timestamp range so that this code doesn't became
obsolete soon
CREATE TABLE my_table_first PARTITION OF my_table FOR VALUES FROM ('FIRST',
'2015-01-07 00:00:00') TO ('FIRST', '2031-01-08 00:00:00');
CREATE TABLE my_table_second PARTITION OF my_table FOR VALUES FROM
('SECOND', '2015-01-07 00:00:00') TO ('SECOND', '2031-01-08 00:00:00');

-- every explain described bellow
EXPLAIN ANALYZE SELECT * from my_table WHERE my_time > '2019-07-30
09:00:00';
EXPLAIN ANALYZE SELECT * from my_table WHERE my_time > now();
EXPLAIN ANALYZE SELECT * from my_table WHERE my_enum = 'FIRST' AND my_time >
'2019-07-30 09:00:00';
EXPLAIN ANALYZE SELECT * from my_table WHERE my_enum = 'FIRST' AND my_time >
now();

######################################################

-- first query that correctly selects both partitions using exact value for
timestamp
EXPLAIN ANALYZE SELECT * from my_table WHERE my_time > '2019-07-30
09:00:00';
Append (cost=0.00..72.42 rows=1234 width=16) (actual time=0.018..0.018
rows=0 loops=1)
-> Seq Scan on my_table_first (cost=0.00..33.13 rows=617 width=16)
(actual time=0.010..0.010 rows=0 loops=1)
Filter: (my_time > '2019-07-30 09:00:00'::timestamp without time
zone)
-> Seq Scan on my_table_second (cost=0.00..33.13 rows=617 width=16)
(actual time=0.006..0.006 rows=0 loops=1)
Filter: (my_time > '2019-07-30 09:00:00'::timestamp without time
zone)

-- second query that also correctly selects both partitions, but now using
an expression instead of exact value
EXPLAIN ANALYZE SELECT * from my_table WHERE my_time > now();
Append (cost=0.00..81.67 rows=1234 width=16) (actual time=0.018..0.018
rows=0 loops=1)
-> Seq Scan on my_table_first (cost=0.00..37.75 rows=617 width=16)
(actual time=0.010..0.010 rows=0 loops=1)
Filter: (my_time > now())
-> Seq Scan on my_table_second (cost=0.00..37.75 rows=617 width=16)
(actual time=0.006..0.007 rows=0 loops=1)
Filter: (my_time > now())

-- third query that correctly selects only my_table_first partition, using
exact values for enum and timestamp
EXPLAIN ANALYZE SELECT * from my_table WHERE my_enum = 'FIRST' AND my_time >
'2019-07-30 09:00:00';
Append (cost=0.00..37.77 rows=3 width=16) (actual time=0.015..0.015 rows=0
loops=1)
-> Seq Scan on my_table_first (cost=0.00..37.75 rows=3 width=16)
(actual time=0.013..0.013 rows=0 loops=1)
Filter: ((my_time > '2019-07-30 09:00:00'::timestamp without time
zone) AND (my_enum = 'FIRST'::enum1))

-- fourth query that completely fails to find partition
EXPLAIN ANALYZE SELECT * from my_table WHERE my_enum = 'FIRST' AND my_time >
now();
Result (cost=0.00..0.00 rows=0 width=16) (actual time=0.001..0.001 rows=0
loops=1)
One-Time Filter: false

This seems like a bug to me. In the example above I used enums because that
is our use case, but we use PARTITION BY RANGE (my_integer, my_time) instead
of (my_enum, my_time), and use values 1,2 instead of 'FIRST','SECOND', this
still doesn't work:
######################################################

DROP TABLE IF EXISTS my_table;
DROP TYPE IF EXISTS enum1;

CREATE TYPE enum1 AS ENUM ( 'FIRST', 'SECOND' );

CREATE TABLE my_table (
my_enum enum1 NOT NULL,
my_time timestamp NOT NULL,
my_integer integer NOT NULL
) PARTITION BY RANGE (my_integer, my_time);

CREATE TABLE my_table_first PARTITION OF my_table FOR VALUES FROM (1,
'2015-01-07 00:00:00') TO (1, '2031-01-08 00:00:00');
CREATE TABLE my_table_second PARTITION OF my_table FOR VALUES FROM (2,
'2015-01-07 00:00:00') TO (2, '2031-01-08 00:00:00');

EXPLAIN ANALYZE SELECT * from my_table WHERE my_integer = 1 AND my_time >
now();

######################################################
Result (cost=0.00..0.00 rows=0 width=16) (actual time=0.001..0.001 rows=0
loops=1)
One-Time Filter: false

We are currently using a workaround to put exact value, but it's a bit hard
to track and keep in mind all the workarounds we must do for this specific
case, maybe in the documentation there should be all the cases listed where
multicolumn partitioning does not work. Other examples where we must use a
workaround are already reported:
https://www.postgresql.org/message-id/358cd54d-c018-60f8-7d76-55780eef6678%40lab.ntt.co.jp

I hope the fixes/improvements in the mentioned link are at least being
considered.

Kind regards,
Damir

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-07-30 14:34:21 Re: BUG #15933: Partition by multiple columns bug
Previous Message Kyotaro Horiguchi 2019-07-30 03:59:11 Re: BUG #15923: Prepared statements take way too much memory.