Re: why partition pruning doesn't work?

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: why partition pruning doesn't work?
Date: 2018-06-11 01:55:33
Message-ID: CAKJS1f-6GODRNgEtdPxCnAPme2h2hTztB6LmtfdmcYAAOE0kQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for working on and pushing those fixes.

On 11 June 2018 at 10:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It's very unclear for example what the subplan_map and subpart_map
> arrays really are, eg what are they indexed by? I get the impression
> that only one of them can have a non-minus-1 value for a given index,
> but that's nowhere explained. Also, we have

They're indexed by the partition indexes as returned by the partition
pruning code. I've included a patch to fix this.

> * partprunedata Array of PartitionPruningData for the plan's target
> * partitioned relation. First element contains the
> * details for the target partitioned table.
>
> And? What are the other elements, what's the index rule, is there a
> specific ordering for the other elements? For that matter, "target
> partitioned table" is content-free. Do you mean topmost partitioned
> table? I suspect we expect the hierarchy to be flattened such that
> ancestors appear before children, but that's not stated --- and if it
> were, this bit about the first element would be a consequence of that.

I've hopefully improved this comment in the attached patch. Although
we may need to delay this until [1] has been addressed

Only the index of the first element is important. Looking at
add_paths_to_append_rel() the parents will always be listed before
their children. That's not very well documented either, but having
the top-level parent as the first element is relied upon for more than
runtime partition pruning.

> Code-wise, there are some loose ends to be looked at.
>
> * Related to the above, doesn't the loop at execPartition.c:1658 need
> to work back-to-front? Seems like it's trying to propagate info upwards
> in the hierarchy; looking at a subpartition's present_parts value when
> you still might change it later doesn't look right at all.

It works just fine front-to-back. That's why I added the 2nd loop. It
propagates the work done in the first loop, so the code, as it is now,
works if the array is in any order.

It may be possible to run the first loop back-to-front and get rid of
the 2nd one. I've done this in the attached patch. I felt less
confident doing this earlier as the order of that array was not
defined well.

> * partkey_datum_from_expr and its caller seem pretty brain-dead with
> respect to nulls. It's not even considering the possibility that a
> Const has constisnull = true. Now perhaps such a case can't reach
> here because of plan-time constant-folding, but I don't think this code
> has any business assuming that. It's also being very stupid about
> null values from expressions, just throwing up its hands and supposing
> that nothing can be proven. In reality, doesn't a null guarantee we
> can eliminate all partitions, since the comparison functions are
> presumed strict?

You are right. I admit to not having thought more about that and just
taking the safe option. It makes sense to fix it. I imagine it could
be a bit annoying that pruning just gives up in such a case. I've
added code for this in the attached patch.

> * I'm fairly suspicious of the fmgr_info and fmgr_info_copy calls in
> perform_pruning_base_step. Those seem likely to leak memory, and
> for sure they destroy any opportunity for the called comparison
> function to cache info in fn_extra --- something that's critical
> for performance for some comparison functions such as array_eq.
> Why is it necessary to suppose that those functions could change
> from one execution to the next?

IIRC the problem there was that we didn't think of a good place to
cache the FmgrInfo. But rethinking of that now we could probably use
the same method as I used in run-time pruning to cache the exprstate.
Amit, what do you think?

> * The business with ExecFindInitialMatchingSubPlans remapping the
> subplan indexes seems very dubious to me. Surely, that is adding
> way more complexity and fragility than it's worth.

It's all contained in a single .c file. It does not seem that hard to
get it right. Nothing outside of exexPartition.c has any business
changing anything in these arrays. Nothing else even has to look at
them.

I think having the ability to prune during executor initialisation is
enormously important. I think without it, this patch is less than half
as useful. However, if you didn't mean removing the executor
initialise pruning, and just not remapping the arrays, then I'm not
sure how we'd do that. I'd previously tried having NULL subnodes in
the Append and I thought it required too much hacking work in
explain.c, so I decided to just collapse the array instead and do what
was required to make it work after having removed the unneeded
subplans. Did you have another idea on how to do this?

[1] https://www.postgresql.org/message-id/CAKJS1f9KG5nnOFhigWm4ND5Ut-yU075iJyA%2BACVyQnZ-ZW1Qtw%40mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
various_partition_prune_fixes.patch application/octet-stream 14.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-06-11 02:50:55 Re: commitfest 2018-07
Previous Message Michael Paquier 2018-06-11 01:38:14 Re: SHOW ALL does not honor pg_read_all_settings membership