Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2018-03-23 12:19:27
Message-ID: e4987e24-8baf-c248-9d6b-e1f80531930d@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Thanks for the review.

On 2018/03/21 6:29, Robert Haas wrote:
> On Tue, Mar 20, 2018 at 7:07 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> On 2018/03/16 21:55, Amit Langote wrote:
>>> Attached updated patches.
>>
>> Attached is further revised version.
>>
>> Of note is getting rid of PartitionPruneContext usage in the static
>> functions of partprune.c. Most of the code there ought to only run during
>> planning, so it can access the necessary information from RelOptInfo
>> directly instead of copying it to PartitionPruneContext and then passing
>> it around.
>
> + if (rte->relkind == RELKIND_PARTITIONED_TABLE)
> + {
> + if (rel->baserestrictinfo != NIL)
> + {
> + live_children = prune_append_rel_partitions(root, rel);
> + did_pruning = true;
> + }
> + }
>
> Use &&

Fixed in the latest version.

>
> + case COMBINE_OR:
> + {
>
> Won't survive pgindent, which currently produces a *massive* diff for
> these patches.

That's gone in the latest patch.

Things for the overall patch should have improved in the latest version.

> + /*
> + * XXX- The following ad-hoc method of pruning only works for list
> + * partitioning. It checks for each partition if all of its
> + * accepted values appear in ne_datums[].
> + */
>
> So why are we doing it this way? How about doing something not
> ad-hoc? I tried to propose that before.

Hmm, perhaps I should have written a better comment.

What I really meant to say is that pruning using <> operators can be
implemented sanely only for list partitions. We can prune a given
partition with <> clauses, only if we can find a <> clause for *all*
values that the partition accepts. Doing so seems doable only for list
partitions where we require enumerating all values that the partition may
contain.

> + * Set *value to the constant value obtained by evaluating 'expr'
> + *
> + * Note that we may not be able to evaluate the input expression, in which
> + * case, the function returns false to indicate that *value has not been
> + * set. True is returned otherwise.
>
> These comments need updating, since this function (laudibly) no longer
> does any evaluating. I wonder how this will work for run-time
> pruning, though.

Fixed the comment.

Run-time pruning patch adds some code to this function to be able to
return values for Params for which, afaik, it also adds some state
information to the PartitionPruneContext argument.

> + if (context->partopcintype[partkeyidx] != exprTyp)
> + {
> + Oid new_supfuncid;
> + int16 procnum;
> +
> +
> + procnum = (context->strategy == PARTITION_STRATEGY_HASH)
> + ? HASHEXTENDED_PROC
> + : BTORDER_PROC;
> + new_supfuncid = get_opfamily_proc(context->partopfamily[partkeyidx],
> + context->partopcintype[partkeyidx],
> + exprTyp, procnum);
> + fmgr_info(new_supfuncid, &context->partsupfunc[partkeyidx]);
> + }
>
> What's the point of this, exactly? Leftover dead code, maybe?

Actually, this *was* an effort to teach the patch to use the correct
comparison function for comparison against partition bounds in case of
clause value being of different type.

After reading David's comment about this, I concluded that it's placed at
a wrong place, which is fixed in the latest patch. The comparison
functions are changed (if needed) in the function that would call
partkey_datum_from_expr, not in partkey_datum_from_expr itself.

> + * Input:
> + * See the comments above the definition of PartScanKeyInfo to see what
> + * kind of information is contained in 'keys'.
>
> There's no such thing as PartScanKeyInfo any more and the function has
> no argument called 'keys'. None of the functions actual arguments are
> explained.

Sorry, this should be gone in the latest patches.

> + /*
> + * If there are multiple pruning steps, we perform them one after another,
> + * passing the result of one step as input to another. Based on the type
> + * of pruning step, perform_pruning_step may add or remove partitions from
> + * the set of partitions it receives as the input.
> + */
>
> The comment sounds great, but the code doesn't work that way; it
> always calls bms_int_members to intersect the new result with any
> previous result. I'm baffled as to how this manages to DTRT if
> COMBINE_OR is used. In general I had hoped that the list of pruning
> steps was something over which we were only going to iterate, not
> recurse. This definitely recurses for the combine steps, but it's
> still (sorta) got the idea of a list of iterable steps. That's a
> weird mix.

At the top-level (in get_matching_partitions), it is assumed that the
steps in the input list come from implicitly AND'd clauses, so the
intersection between partition sets that we get for each.

Anyway, after David's rewrite of this portion of the patch incorporated in
the latest patch, things look a bit different here, although there is
still recursion for combine steps. I'm still considering how to make the
recursion go away.

> + if (nvalues == context->partnatts)
> + {
> + greatest_modulus = get_greatest_modulus(boundinfo);
> + rowHash = compute_hash_value(partnatts, partsupfunc, values,
> + isnull);
> + result_index = partindices[rowHash % greatest_modulus];
> + if (result_index >= 0)
> + return bms_make_singleton(result_index);
> + }
> + else
> + /* Can't do pruning otherwise, so return all partitions. */
> + return bms_add_range(NULL, 0, context->nparts - 1);
>
> Wouldn't we want to (1) arrange things so that this function is never
> called if nvalues < context->partnatts && context->strategy ==
> PARTITION_STRATEGY_HASH or at least (2) avoid constructing isnull from
> nullkeys if we're not going to use it?

We call this function even if nvalues < context->partnatts but we found IS
NULL clauses for *all* the remaining columns.

Although, given the checks that planner (partprune.c) performs, we should
get here only if pruning is possible, so the code to handle cases where
pruning couldn't occur is redundant.

> Also, shouldn't we be sanity-checking the strategy number here?

That's right, fixed.

>
> I'm out of time for right now but it looks to me like this patch still
> needs quite a bit of fine-tuning.

I have posted an updated patch in reply to David's review.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-03-23 12:22:40 Re: [HACKERS] Add support for tuple routing to foreign partitions
Previous Message Amit Langote 2018-03-23 12:15:47 Re: [HACKERS] path toward faster partition pruning