Re: BUG #15334: Partition elimination not working as expected when using enum as partition key

From: Damir Ciganović-Janković <damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com>
To: Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, noreply(at)postgresql(dot)org
Subject: Re: BUG #15334: Partition elimination not working as expected when using enum as partition key
Date: 2018-08-17 07:17:16
Message-ID: CA+-2sJ6dZaZRnnHfKPRAqnyYF+9Pz4EtDnwNsSJ45jc10hmsgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> Hi.
>
> On 2018/08/16 23:05, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 15334
> > Logged by: Damir Ciganović-Janković
> > Email address: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com
> > PostgreSQL version: 10.2
> > Operating system: Linux (Centos), Windows
> > Description:
> >
> > Our tables are partitioned by partition key RANGE (resolution,
timestamp).
> >
> > Timestamp is of type 'timestamp without time zone' and resolution is
hour
> > own enum type which contains values 'HOUR' , 'DAY' and 'MONTH':
> > CREATE TYPE resolution AS ENUM (
> > 'HOUR',
> > 'DAY',
> > 'MONTH'
> > );
> >
> > Our table "my_report" has timestamp, resolution and count columns.
> >
> > This is the query I am executing:
> > select "my_report"."count" from my_report where
("my_report"."resolution" =
> > 'HOUR'::resolution and "my_report"."timestamp" >= timestamp '2018-08-16
> > 07:00:00' and "my_report"."timestamp" < timestamp '2018-08-16
10:00:00.0')
> > limit 10;
> > (NOTE: I simplified the query so real query and execution plan are
> > different, but I think you will understand me)
> >
> > I noticed that when doing this query via psql cmd I get this execution
plan
> >
> > +++++
> > ...
> > -> Append (cost=0.00..159.76 rows=1 width=24) (actual
time=0.585..0.585
> > rows=0 loops=1)
> > -> Seq Scan on my_report_hour_20180816 (cost=0.00..159.76 rows=1
> > width=24) (actual time=0.584..0.584 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp
without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without
time
> > zone) AND (resolution = 'HOUR'::resolution))
> > Rows Removed by Filter: 4672
> > ...
> > ++++++
> >
> > This is the correct behaviour, we picked the partition where our
timestamps
> > are in range, and also resolution is HOUR.
> > Now, when executing this same query via our client (Java), but passing
the
> > resolution as a parameter (and not hardcoding HOUR value in my query
like
> > first time). I would expect that execution plan will be the same, but
that
> > is not the case:
> >
> > ++++++
> > ...
> > -> Append (cost=0.00..372.24 rows=3 width=24) (actual
time=1.117..1.117
> > rows=0 loops=1)
> > -> Seq Scan on my_report_hour_20180816 (cost=0.00..183.12 rows=1
> > width=24) (actual time=0.589..0.589 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp
without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without
time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> > Rows Removed by Filter: 4672
> > -> Seq Scan on my_report_day_201808 (cost=0.00..94.56 rows=1
width=24)
> > (actual time=0.265..0.265 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp
without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without
time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> > Rows Removed by Filter: 2336
> > -> Seq Scan on my_report_month_201808 (cost=0.00..94.56 rows=1
> > width=24) (actual time=0.261..0.261 rows=0 loops=1)
> > Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp
without
> > time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without
time
> > zone) AND (resolution = ('HOUR'::cstring)::resolution))
> > Rows Removed by Filter: 2336
> > ...
> > ++++++
> >
> > As we can see here, our execution will do seq scan on all my_report
> > partitions for that period (resolutions: day, month and hour) even
though we
> > put HOUR as our query parameter. One thing that I observed is that
insead of
> > (resolution = 'HOUR'::resolution)) like we got in the first example,
filters
> > look like this (resolution = ('HOUR'::cstring)::resolution)).
>
> It's strange that the expression "('HOUR'::cstring)::resolution" got
> through without getting the following error
>
> ERROR: cannot cast type cstring to resolution
> LINE 1: select ('HOUR'::cstring)::resolution;
>
> For example,
>
> select ('HOUR'::cstring)::resolution;
> ERROR: cannot cast type cstring to resolution
> LINE 1: select ('HOUR'::cstring)::resolution;
>
> > It seems to me that PostgreSQL is not removing the (::cstring) part
before
> > partition elimination so that it is forced to go through all of them
> > bassically ignoring the resolution value.
> >
> > I have a workaround now by putting exact value in the query (I wrote
exact
> > string `"my_report"."resolution" = 'DAY'::resolution`) into my query,
and
> > not passing the value as parameter, but this looks to me as a bug.
>
> You listed 10.2 as the PG version being used, which indeed had a bug that
> caused partition pruning to not work correctly with enum type partition
> key, which got fixed in 10.5 [1].
>
> I'm mystified as to why it worked at all if you're are using 10.2 (you
> said it worked correctly when running the command using psql with the
> hard-coded value of resolution.) So maybe, it's not the bug of 10.2 that
> got fixed in 10.5 is what's causing you the trouble, but then it's also
> not clear what the problem really is.
>
> Thanks,
> Amit
>
> [1] "Fix handling of partition keys whose data type uses a polymorphic
> btree operator class, such as arrays"
>
> https://www.postgresql.org/docs/devel/static/release-10-5.html

I know about the issue and the fix, but this is a different bug, it seems
to me. I will try to explain the issue to be more clear.
This is the enum, table and its partitions:
+++++++++++++++++++++++++++++
create type ab as enum ('A', 'B');
create table test (key ab not null) partition by list (key);
create table test_a partition of test for values in ('A');
create table test_b partition of test for values in ('B');
+++++++++++++++++++++++++++++

We are using Java, so this is the code:
+++++++++++++++++++++++++++++
try (Connection connection = DriverManager.getConnection(...);
PreparedStatement stmt = connection.prepareStatement("EXPLAIN SELECT *
FROM test WHERE key = ?::ab")) {

stmt.setString(1, "A");
try (ResultSet rs = stmt.executeQuery()) {
while (rs.next()) {
System.out.println(rs.getString(1));
}
}
}
+++++++++++++++++++++++++++++
Simple select with enum value as a parameter. Note that we didn't put
"::cstring" in query. This is the result of the EXPLAIN SELECT from the
code:
+++++++++++++++++++++++++++++
Append (cost=0.00..109.25 rows=26 width=4)
-> Seq Scan on test_a (cost=0.00..54.63 rows=13 width=4)
Filter: (key = ('A'::cstring)::ab)
-> Seq Scan on test_b (cost=0.00..54.63 rows=13 width=4)
Filter: (key = ('A'::cstring)::ab)
+++++++++++++++++++++++++++++
I have put wireshark snoop in attachment where we can see that our java
client is not the one adding the "::cstring" part into the code.

I hope this helps

Attachment Content-Type Size
dump.pcap application/octet-stream 631 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message jimmy 2018-08-17 08:20:46 Bug: ERROR: invalid cache ID: 42 CONTEXT: parallel worker
Previous Message Amit Langote 2018-08-17 02:06:47 Re: BUG #15334: Partition elimination not working as expected when using enum as partition key