Re: inconsistent results querying table partitioned by date

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alan Jackson <ajax(at)tvsquared(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: inconsistent results querying table partitioned by date
Date: 2019-05-16 19:25:35
Message-ID: 12600.1558034735@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> Anyway, I've pushed David's patch now, on to look at Amit's.

So, the changes in gen_prune_steps_from_opexps seem okay, but
the change in perform_pruning_base_step is just wrong, as can
be seen from this extended form of the test case:

create table mc3p (a int, b int, c int) partition by range (a, abs(b), c);
create table mc3p1 partition of mc3p
for values from (1, 1, 1) to (1, 1, 10);
create table mc3p2 partition of mc3p
for values from (1, 1, 10) to (2, minvalue, minvalue);
create table mc3p3 partition of mc3p
for values from (2, minvalue, minvalue) to (3, maxvalue, maxvalue);
insert into mc3p values (1, 1, 1), (1,1,10), (2, 1, 1);

set plan_cache_mode = force_generic_plan;
prepare init_exec_prune_q1 as
select * from mc3p where a = $1 and abs(b) <= $2 and c < (select 13);
explain (analyze, costs off, summary off, timing off)
execute init_exec_prune_q1 (1,2);
deallocate init_exec_prune_q1;
reset plan_cache_mode;

drop table mc3p;

This should of course find the (1,1,1) row, but it does not because
it converts the initial pruning condition to be "a = 1 and abs(b) = 2",
whereas the correct thing would be "a = 1 and abs(b) <= 2".

It seems to me that the correct fix, rather than forcing equality
strategy when you've missed out some of the quals, is to back up
and use the strategy of the last clause you did use. But unless
I'm missing something, that information is just not available from
the PartitionPruneStepOp data structure.

We could extend the PartitionPruneStepOp struct to have an array
or list of StrategyNumbers (but I'm unsure about how safe that'd
be to backpatch into v11).

Another idea is that we ought to generate separate partitioning
steps lists for the initial-pruning and runtime-pruning cases,
ie invoke gen_partprune_steps yet a third time. Ugh (and that
still means a data structure change, just in a different place).

The bottom line here is that we've got an inflexible data structure
that was designed on the assumption that gen_partprune_steps has all
the intelligence, and trying to adjust things later is just trouble.

I think this is really quite closely tied to my previous complaints
about the design for plan-time versus run-time pruning, just mutatis
mutandis. The lack of any holistic, principled approach to
which-quals-apply-when is what's biting us here.

Moreover, now that I understand what's going on here a bit better,
I am deathly afraid that there are yet other bugs hiding in this
same area. What we've basically got is that analyze_partkey_exprs
and perform_pruning_base_step are trying to reverse-engineer what
gen_partprune_steps would have done differently if it used only the
quals not containing exec params. They don't really have enough
information available to do that correctly, and even if they did,
the fact that the logic looks nothing like the same is going to
be a fertile source of bugs and maintenance gotchas.

In other words, as I compose this I'm talking myself into the
idea that invoking gen_partprune_steps three times (for planner,
executor startup, and executor runtime, with corresponding
filters on which clauses can be used) is the only safe near-term
fix. In the future we can look at cutting down the repetitive
costs that entails.

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-05-16 20:37:50 Re: inconsistent results querying table partitioned by date
Previous Message Andres Freund 2019-05-16 17:04:34 Re: BUG #15808: ERROR: subtransaction logged without previous top-level txn record (SQLSTATE XX000)