Re: BUG #15334: Partition elimination not working as expected when using enum as partition key

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #15334: Partition elimination not working as expected when using enum as partition key
Date: 2018-08-17 02:06:47
Message-ID: 1fff2a8b-92ef-c9e0-6a57-0e887d95c62b@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi.

On 2018/08/16 23:05, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 15334
> Logged by: Damir Ciganović-Janković
> Email address: damir(dot)ciganovic(dot)jankovic(at)gmail(dot)com
> PostgreSQL version: 10.2
> Operating system: Linux (Centos), Windows
> Description:
>
> Our tables are partitioned by partition key RANGE (resolution, timestamp).
>
> Timestamp is of type 'timestamp without time zone' and resolution is hour
> own enum type which contains values 'HOUR' , 'DAY' and 'MONTH':
> CREATE TYPE resolution AS ENUM (
> 'HOUR',
> 'DAY',
> 'MONTH'
> );
>
> Our table "my_report" has timestamp, resolution and count columns.
>
> This is the query I am executing:
> select "my_report"."count" from my_report where ("my_report"."resolution" =
> 'HOUR'::resolution and "my_report"."timestamp" >= timestamp '2018-08-16
> 07:00:00' and "my_report"."timestamp" < timestamp '2018-08-16 10:00:00.0')
> limit 10;
> (NOTE: I simplified the query so real query and execution plan are
> different, but I think you will understand me)
>
> I noticed that when doing this query via psql cmd I get this execution plan
>
> +++++
> ...
> -> Append (cost=0.00..159.76 rows=1 width=24) (actual time=0.585..0.585
> rows=0 loops=1)
> -> Seq Scan on my_report_hour_20180816 (cost=0.00..159.76 rows=1
> width=24) (actual time=0.584..0.584 rows=0 loops=1)
> Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> zone) AND (resolution = 'HOUR'::resolution))
> Rows Removed by Filter: 4672
> ...
> ++++++
>
> This is the correct behaviour, we picked the partition where our timestamps
> are in range, and also resolution is HOUR.
> Now, when executing this same query via our client (Java), but passing the
> resolution as a parameter (and not hardcoding HOUR value in my query like
> first time). I would expect that execution plan will be the same, but that
> is not the case:
>
> ++++++
> ...
> -> Append (cost=0.00..372.24 rows=3 width=24) (actual time=1.117..1.117
> rows=0 loops=1)
> -> Seq Scan on my_report_hour_20180816 (cost=0.00..183.12 rows=1
> width=24) (actual time=0.589..0.589 rows=0 loops=1)
> Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> zone) AND (resolution = ('HOUR'::cstring)::resolution))
> Rows Removed by Filter: 4672
> -> Seq Scan on my_report_day_201808 (cost=0.00..94.56 rows=1 width=24)
> (actual time=0.265..0.265 rows=0 loops=1)
> Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> zone) AND (resolution = ('HOUR'::cstring)::resolution))
> Rows Removed by Filter: 2336
> -> Seq Scan on my_report_month_201808 (cost=0.00..94.56 rows=1
> width=24) (actual time=0.261..0.261 rows=0 loops=1)
> Filter: (("timestamp" >= '2018-08-16 07:00:00'::timestamp without
> time zone) AND ("timestamp" < '2018-08-16 10:00:00'::timestamp without time
> zone) AND (resolution = ('HOUR'::cstring)::resolution))
> Rows Removed by Filter: 2336
> ...
> ++++++
>
> As we can see here, our execution will do seq scan on all my_report
> partitions for that period (resolutions: day, month and hour) even though we
> put HOUR as our query parameter. One thing that I observed is that insead of
> (resolution = 'HOUR'::resolution)) like we got in the first example, filters
> look like this (resolution = ('HOUR'::cstring)::resolution)).

It's strange that the expression "('HOUR'::cstring)::resolution" got
through without getting the following error

ERROR: cannot cast type cstring to resolution
LINE 1: select ('HOUR'::cstring)::resolution;

For example,

select ('HOUR'::cstring)::resolution;
ERROR: cannot cast type cstring to resolution
LINE 1: select ('HOUR'::cstring)::resolution;

> It seems to me that PostgreSQL is not removing the (::cstring) part before
> partition elimination so that it is forced to go through all of them
> bassically ignoring the resolution value.
>
> I have a workaround now by putting exact value in the query (I wrote exact
> string `"my_report"."resolution" = 'DAY'::resolution`) into my query, and
> not passing the value as parameter, but this looks to me as a bug.

You listed 10.2 as the PG version being used, which indeed had a bug that
caused partition pruning to not work correctly with enum type partition
key, which got fixed in 10.5 [1].

I'm mystified as to why it worked at all if you're are using 10.2 (you
said it worked correctly when running the command using psql with the
hard-coded value of resolution.) So maybe, it's not the bug of 10.2 that
got fixed in 10.5 is what's causing you the trouble, but then it's also
not clear what the problem really is.

Thanks,
Amit

[1] "Fix handling of partition keys whose data type uses a polymorphic
btree operator class, such as arrays"

https://www.postgresql.org/docs/devel/static/release-10-5.html

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Damir Ciganović-Janković 2018-08-17 07:17:16 Re: BUG #15334: Partition elimination not working as expected when using enum as partition key
Previous Message Jeff Janes 2018-08-16 22:04:59 Re: BUG #15328: postgres segfaults on ALTER OPERATOR ... SET SCHEMA ...