Re: Multi-Column List Partitioning

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-08-25 21:10:33
Message-ID: CALNJ-vRP6n8tB5a=cxZ_oa7VrywgPcSeM-_zQct7LG+4nNUVAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 25, 2021 at 5:41 AM Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
wrote:

> > The new list bound binary search and related comparison support
> > function look a bit too verbose to me. I was expecting
> > partition_list_bsearch() to look very much like
> > partition_range_datum_bsearch(), but that is not the case. The
> > special case code that you wrote in partition_list_bsearch() seems
> > unnecessary, at least in that function. I'm talking about the code
> > fragment starting with this comment:
> >
> > I will look at other parts of the patch next week hopefully. For
> > now, attached is a delta patch that applies on top of your v1, which
> > does:
> >
> > * Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
> > * Make qsort_partition_list_value_cmp simply call
> > partition_lbound_datum_cmp() instead of having its own logic to
> > compare input bounds
> > * Move partition_lbound_datum_cmp() into partbounds.c as a static
> > function (export seems unnecessary)
> > * Add a comment for PartitionBoundInfo.isnulls and remove that for
> null_index
>
> Yes. You are right. The extra code added in partition_list_bsearch()
> is not required and thanks for sharing the delta patch. It looks good
> to me and I have incorporated the changes in the attached patch.
>
> > I guess you're perhaps trying to address the case where the caller
> > does not specify the values for all of the partition key columns,
> > which can happen when the partition pruning code needs to handle a set
> > of clauses matching only some of the partition key columns. But
> > that's a concern of the partition pruning code and so the special case
> > should be handled there (if at all), not in the binary search function
> > that is shared with other callers. Regarding that, I'm wondering if
> > we should require clauses matching all of the partition key columns to
> > be found for the pruning code to call the binary search, so do
> > something like get_matching_hash_bounds() does:
> >
> > Do you think that trying to match list partitions even with fewer keys
> > is worth the complexity of the implementation? That is, is the use
> > case to search for only a subset of partition key columns common
> > enough with list partitioning?
> >
> > If we do decide to implement the special case, remember that to do
> > that efficiently, we'd need to require that the subset of matched key
> > columns constitutes a prefix, because of the way the datums are
> > sorted. That is, match all partitions when the query only contains a
> > clause for b when the partition key is (a, b, c), but engage the
> > special case of pruning if the query contains clauses for a, or for a
> > and b.
>
> Thanks for the suggestion. Below is the implementation details for the
> partition pruning for multi column list partitioning.
>
> In the existing code (For single column list partitioning)
> 1. In gen_partprune_steps_internal(), we try to match the where
> clauses provided by the user with the partition key data using
> match_clause_to_partition_key(). Based on the match, this function can
> return many values like PARTCLAUSE_MATCH_CLAUSE,
> PARTCLAUSE_MATCH_NULLNESS, PARTCLAUSE_NOMATCH, etc.
> 2. In case of PARTCLAUSE_MATCH_CLAUSE, we generate steps using
> gen_prune_steps_from_opexps() (strategy-2) which generate and return a
> list of PartitionPruneStepOp that are based on OpExpr and BooleanTest
> clauses that have been matched to the partition key and it also takes
> care handling prefix of the partition keys.
> 3. In case of PARTCLAUSE_MATCH_NULLNESS, we generate steps using
> gen_prune_step_op() (strategy-1) which generates single
> PartitionPruneStepOp since the earlier list partitioning supports
> single column and there can be only one NULL value. In
> get_matching_list_bounds(), if the nullkeys is not empty, we fetch the
> partition index which accepts null and we used to return from here.
>
> In case of multi column list partitioning, we have columns more than
> one and hence there is a possibility of more than one NULL values in
> the where clauses. The above mentioned steps are modified like below.
>
> 1. Modified the match_clause_to_partition_key() to generate an object
> of PartClauseInfo structure and return PARTCLAUSE_MATCH_CLAUSE even in
> case of clauses related to NULL. The information required to generate
> PartClauseInfo is populated here like the constant expression
> consisting of (Datum) 0, op_strategy, op_is_ne, etc.
> 2. Since I am returning PARTCLAUSE_MATCH_CLAUSE, now we use strategy-2
> (gen_prune_steps_from_opexps) to generate partition pruning steps.
> This function takes care of generating a list of pruning steps if
> there are multiple clauses and also takes care of handling prefixes.
> 3. Modified perform_pruning_base_step() to generate the datum values
> and isnulls data of the where clauses. In case if any of the key
> contains NULL value then the corresponding datum value is 0.
> 4. Modified get_matching_list_bounds() to generate the minimum offset
> and/or maximum offset of the matched values based on the difference
> operation strategies. Now since the NULL containing bound values are
> part of 'boundinfo', changed the code accordingly to include the NULL
> containing partitions or not in different scenarios like
> InvalidStrategy, etc.
>
> I have done some cosmetic changes to
> v1_multi_column_list_partitioning.patch. So all the above code changes
> related to partition pruning are merged with the previous patch and
> also included the delta patch shared by you. Hence sharing a single
> patch.
>
> Kindly have a look and share your thoughts.
>
>
> Hi,

bq. Supported new syantx to allow mentioning multiple key information.

syantx -> syntax

+ isDuplicate = checkForDuplicates(result, values);
+ if (isDuplicate)
+ continue;

It seems the variable isDuplicate is not needed. The if statement can
directly check the return value from checkForDuplicates().

+ //TODO: Handle for multi-column cases
+ for (j = 0; j < 1; j++)

Is this part going to be updated in the next patch?

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-08-25 22:25:03 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Stephen Frost 2021-08-25 21:07:22 Re: log_autovacuum in Postgres 14 -- ordering issue