Re: inconsistent results querying table partitioned by date

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alan Jackson <ajax(at)tvsquared(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: inconsistent results querying table partitioned by date
Date: 2019-05-09 23:22:39
Message-ID: 24525.1557444159@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alan Jackson <ajax(at)tvsquared(dot)com> writes:
> Im having a problem with querying a table partitioned by date.
> Depending on the sequence of operations on a date parameter used in a where clause, I either get the expected results, or no results.

Yeah, this is pretty clearly broken. It looks to me like the partition
pruning code is making insupportable assumptions about a comparison to
a stable expression. Using your example table:

regression=# explain select * from dataid where id=1 and datadatetime < localtimestamp;
QUERY PLAN
----------------------------------------------------------------------------------
Bitmap Heap Scan on dataid_default (cost=4.19..11.31 rows=3 width=12)
Recheck Cond: ((id = 1) AND (datadatetime < LOCALTIMESTAMP))
-> Bitmap Index Scan on dataid_default_pkey (cost=0.00..4.19 rows=3 width=0)
Index Cond: ((id = 1) AND (datadatetime < LOCALTIMESTAMP))
(4 rows)

It should absolutely not have pruned away the dataid_201902 partition,
but it did. It's okay with an immutable expression:

regression=# explain select * from dataid where id=1 and datadatetime < '2019-05-09'::timestamp;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Append (cost=4.18..22.63 rows=6 width=12)
-> Bitmap Heap Scan on dataid_201902 (cost=4.18..11.30 rows=3 width=12)
Recheck Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on dataid_201902_pkey (cost=0.00..4.18 rows=3 width=0)
Index Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on dataid_default (cost=4.18..11.30 rows=3 width=12)
Recheck Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on dataid_default_pkey (cost=0.00..4.18 rows=3 width=0)
Index Cond: ((id = 1) AND (datadatetime < '2019-05-09 00:00:00'::timestamp without time zone))
(9 rows)

or a volatile one:

regression=# explain select * from dataid where id=1 and datadatetime < '2019-05-09'::timestamp + random()*'1 day'::interval;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Append (cost=4.23..29.80 rows=6 width=12)
-> Bitmap Heap Scan on dataid_201902 (cost=4.23..14.88 rows=3 width=12)
Recheck Cond: (id = 1)
Filter: (datadatetime < ('2019-05-09 00:00:00'::timestamp without time zone + (random() * '1 day'::interval)))
-> Bitmap Index Scan on dataid_201902_pkey (cost=0.00..4.23 rows=10 width=0)
Index Cond: (id = 1)
-> Bitmap Heap Scan on dataid_default (cost=4.23..14.88 rows=3 width=12)
Recheck Cond: (id = 1)
Filter: (datadatetime < ('2019-05-09 00:00:00'::timestamp without time zone + (random() * '1 day'::interval)))
-> Bitmap Index Scan on dataid_default_pkey (cost=0.00..4.23 rows=10 width=0)
Index Cond: (id = 1)
(11 rows)

but somebody's confused about what can be done with stable expressions.

While I'm on about it, this behavior is also insupportable:

regression=# explain select * from dataid where id=1 and datadatetime < '2018-05-09'::timestamptz ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on dataid_default (cost=4.18..11.30 rows=3 width=12)
Recheck Cond: ((id = 1) AND (datadatetime < '2018-05-09 00:00:00-04'::timestamp with time zone))
-> Bitmap Index Scan on dataid_default_pkey (cost=0.00..4.18 rows=3 width=0)
Index Cond: ((id = 1) AND (datadatetime < '2018-05-09 00:00:00-04'::timestamp with time zone))
(4 rows)

because timestamp-against-timestamptz comparison is inherently only
stable; the pruning code is way exceeding its authority by supposing
that a comparison that holds at plan time will hold at runtime,
even with a constant comparison value.

The reason for the difference in your results is that one expression
is immutable and the other is only stable:

regression=# explain verbose select
(('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' ) at time zone 'UTC' + '2 days'::interval) as workingdate,
(('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC') as notworkingdate;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=16)
Output: '2019-02-28 05:00:00'::timestamp without time zone, timezone('UTC'::text, ('2019-02-26 00:00:00-05'::timestamp with time zone + '2 days'::interval))
(2 rows)

the reason being that timestamptz + interval depends on the timezone
setting (for some intervals) but timestamp + interval never does.

Seems to be equally broken in v11 and HEAD. I didn't try v10.

> I hope there is something simple I can change in the partition definitions to work around this.

Until we fix the bug, I think the best you can do is not use stable
expressions in this context.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-05-10 01:34:03 BUG #15797: Wrong Execution Plan
Previous Message Tom Lane 2019-05-09 19:24:20 Re: BUG #15795: ERROR: could not find pathkey item to sort