Re: inconsistent results querying table partitioned by date

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, ajax(at)tvsquared(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: inconsistent results querying table partitioned by date
Date: 2019-05-10 06:33:37
Message-ID: 20190510.153337.236910404.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

At Fri, 10 May 2019 14:37:34 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <49cd5235-cbe6-e686-8014-85c1e45efe56(at)lab(dot)ntt(dot)co(dot)jp>
> > because timestamp-against-timestamptz comparison is inherently only
> > stable; the pruning code is way exceeding its authority by supposing
> > that a comparison that holds at plan time will hold at runtime,
> > even with a constant comparison value.
>
> I looked into it and the problem is not really that plan-time pruning is
> comparing stable expressions against partition bounds. If it had, it
> wouldn't have pruned dataid_201902 anyway, because its bounding range for
> datadatetime is '2019-02-01' to '2019-03-01', which is clearly less than
> '2019-05-09' (Tom's localtimestamp).
>
> The real problem seems to be with the way range partition pruning assumes
> an operator strategy to perform pruning with. Quoted query's WHERE clause
> looks something like this: (firstkey = CONSTANT AND secondkey <
> STABLE_EXPRESSION). From this set of clauses, a (CONSTANT,
> STABLE_EXPRESSION) tuple is formed to be compared against partition bounds
> using row-comparison-like semantics. As things stand today, the actual
> comparison function (partprune.c: get_matching_range_bounds()) receives
> only the strategy of the last expression, which in this case is that of a
> LESS operator. When the tuple (CONSTANT, STABLE_EXPRESSION) passes
> through the last step to extract Datum values to be passed to
> get_matching_range_bounds, it's correctly determined that
> STABLE_EXPRESSION cannot be computed during planning and so the tuple is
> truncated to just (CONSTANT-Datum), but the strategy to assume during
> pruning is still that of the LESS operator, whereas now it should really
> be EQUAL. With LESS semantics, get_matching_range_bounds() concludes that
> no partition bounds are smaller than 1 (extracted from id=1 in the above
> query), except the default partition, so it prunes 'dataid_201902'.

I concluded the same.

> I've attached a patch to fix that. Actually, I've attached two patches --
> the 1st one adds a test for the misbehaving case with *wrong* output
> wherein a partition is incorrectly pruned, and the 2nd actually fixes the
> bug and updates the output of the test added by the 1st patch. Divided
> the patch this way just to show the bug clearly.

But this seems a bit wrong.

If the two partition keys were in reverse order, pruning still
fails.

CREATE TABLE dataid2 (
datadatetime timestamp without time zone NOT NULL,
id integer not null,
CONSTRAINT dataid2_pkey PRIMARY KEY (datadatetime, id)
) PARTITION BY RANGE (datadatetime, id);

CREATE TABLE dataid2_201902 PARTITION OF dataid2 FOR VALUES FROM ('2019-02-01 00:00:00', 1) TO ('2019-03-01 00:00:00', 1);

CREATE TABLE dataid2_default PARTITION OF dataid2 DEFAULT;

insert into dataid2 values ('2019-02-24T00:00:00', 1);

select * from dataid2 where id = 1 and datadatetime < (('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York' + '2 days'::interval) at time zone 'UTC');
datadatetime | id
--------------+----
(0 rows)

This is wrong.

The condition is divided into two part (id = 1) and (datadatetime
< ..) and the latter reduces to nothing and the former remains
unchanged. Pruning continues using id = 1 and (I suppose) but
that is not partition_range_datum_bsearch()'s assumption. As the
result all partitions (other than default) are gone.

In passing I found a typo while looking this issue.

| case BTLessStrategyNumber:
|
| /*
| * Look for the greatest bound that is < or <= lookup value and
| * set minoff to its offset.

I think the "minoff" is typo of "maxoff".

> > Seems to be equally broken in v11 and HEAD. I didn't try v10.
>
> v10 is fine, as it uses constraint exclusion.
>
> Attached patches apply to both v11 and HEAD.

Mmm. This doesn't apply on head on my environment.

> patching file src/test/regress/expected/partition_prune.out
> Hunk #1 FAILED at 951.

git rev-parse --short HEAD
d0bbf871ca

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Amit Langote 2019-05-10 09:18:32 Re: inconsistent results querying table partitioned by date
Previous Message Amit Langote 2019-05-10 05:37:34 Re: inconsistent results querying table partitioned by date