Re: Multi-Column List Partitioning

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-11-02 13:33:37
Message-ID: CAMm1aWaA1dQpzwhf9AhpAqEtaGGnpM2hUwJ14ojQkk4J09zM_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for reviewing.

> +isListBoundDuplicated(List *list_bounds, List *new_bound)
>
> + Const *value1 = castNode(Const, list_nth(elem, i));
> + Const *value2 = castNode(Const, list_nth(new_bound, i));
>
> Should the upper bound for index i take into account the length of new_bound ?
> If the length of new_bound is always the same as that for elem, please add an assertion.

The length of 'elem' should be same as length of 'new_bound'. Added
assert statement for the same.

> For transformPartitionListBounds():
> + deparse_expression((Node *) list_nth(partexprs, j),
> + deparse_context_for(RelationGetRelationName(parent),
> + RelationGetRelid(parent)),
>
> Please consider calling RelationGetRelationName(parent) and RelationGetRelid(parent) (and assigning to local variables) outside the loop.

I don't think this is an issue as 'RelationGetRelationName' and
'RelationGetRelid' are macros. Please let me know if your opinion is
different.

> +get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)
>
> get_list_datum_count -> get_list_datums_count

There was a function earlier with the name
'get_non_null_list_datum_count()'. So now this has changed to
'get_list_datum_count()' as we are not separating the non null datums
from the list. The new name is inline with the old function name which
was already accepted by the community. So I feel it is better to not
change.

> For partition_bounds_equal():
>
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
>
> Should the initialization of b1_isnull and b2_isnull be done inside the loop (so that they don't inherit value from previous iteration) ?

Nice catch. Fixed.

> In get_qual_for_list(), it would be better if repetitive code can be extracted into a helper method:

I have removed the repetitive code and made a common function named
'get_qual_for_list_datums()'.

> For match_clause_to_partition_key():
>
> + if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> + {
> + *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> + return PARTCLAUSE_MATCH_NULLNESS;
> + }
> + else
>
> Since the if block ends with return, the 'else' is not needed - else block can be indented to the left.

Fixed.

> get_min_and_max_off(): I think get_min_and_max_offset as method name would be more informative.

Fixed.

> + Assert(0 == partition_lbound_datum_cmp(partsupfunc, partcollation,
> + boundinfo->datums[off],
> + boundinfo->isnulls[off],
> + values, isnulls, nvalues));
>
> If the 'while (off >= 1)' loop exits without modifying off, is the above assertion always true (can boundinfo->datums[off] be accessed without checking bound) ?

Yes. The assertion holds good even though the control doesn't enter
the loop. In that case the 'off' can be directly considered as minoff
or maxoff. Since we are considering it as valid, the assertion is
needed.

Thanks & Regards,
Nitin Jadhav

On Fri, Oct 22, 2021 at 9:30 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
>
>
> On Fri, Oct 22, 2021 at 3:50 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>>
>>
>>
>> On Fri, Oct 22, 2021 at 2:48 AM Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com> wrote:
>>>
>>> > While testing further I got a crash with partition wise join enabled for multi-col list partitions. please find test case & stack-trace below.
>>>
>>> Thanks for sharing. I have fixed the issue in the attached patch.
>>>
>>> Thanks & Regards,
>>> Nitin Jadhav
>>>
>>>>>>>>
>>>>>>>>
>> Hi,
>>
>> +isListBoundDuplicated(List *list_bounds, List *new_bound)
>>
>> + Const *value1 = castNode(Const, list_nth(elem, i));
>> + Const *value2 = castNode(Const, list_nth(new_bound, i));
>>
>> Should the upper bound for index i take into account the length of new_bound ?
>> If the length of new_bound is always the same as that for elem, please add an assertion.
>>
>> For transformPartitionListBounds():
>> + deparse_expression((Node *) list_nth(partexprs, j),
>> + deparse_context_for(RelationGetRelationName(parent),
>> + RelationGetRelid(parent)),
>>
>> Please consider calling RelationGetRelationName(parent) and RelationGetRelid(parent) (and assigning to local variables) outside the loop.
>>
>> +get_list_datum_count(PartitionBoundSpec **boundspecs, int nparts)
>>
>> get_list_datum_count -> get_list_datums_count
>>
>> For partition_bounds_equal():
>>
>> + if (b1->isnulls)
>> + b1_isnull = b1->isnulls[i][j];
>> + if (b2->isnulls)
>> + b2_isnull = b2->isnulls[i][j];
>>
>> Should the initialization of b1_isnull and b2_isnull be done inside the loop (so that they don't inherit value from previous iteration) ?
>>
>> Cheers
>
>
> Hi,
> Continuing review.
>
> + * For the multi-column case, we must make an BoolExpr that
>
> an BoolExpr -> a BoolExpr
>
> In get_qual_for_list(), it would be better if repetitive code can be extracted into a helper method:
>
> + if (val->constisnull)
> + {
> + NullTest *nulltest = makeNode(NullTest);
> +
> + key_is_null[j] = true;
> +
> + nulltest->arg = keyCol[j];
> + nulltest->nulltesttype = IS_NULL;
> + nulltest->argisrow = false;
> + nulltest->location = -1;
> +
> + if (key->partnatts > 1)
> + and_args = lappend(and_args, nulltest);
> + else
> + is_null_test = (Expr *) nulltest;
> + }
> + else
> + {
> + if (key->partnatts > 1)
> + {
> + Expr *opexpr =
> + make_partition_op_expr(key, j,
> + BTEqualStrategyNumber,
> + keyCol[j],
> + (Expr *) val);
> + and_args = lappend(and_args, opexpr);
> + }
> + else
> + datum_elem = (Expr *) val;
> + }
>
> For match_clause_to_partition_key():
>
> + if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> + {
> + *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> + return PARTCLAUSE_MATCH_NULLNESS;
> + }
> + else
>
> Since the if block ends with return, the 'else' is not needed - else block can be indented to the left.
>
> get_min_and_max_off(): I think get_min_and_max_offset as method name would be more informative.
>
> + Assert(0 == partition_lbound_datum_cmp(partsupfunc, partcollation,
> + boundinfo->datums[off],
> + boundinfo->isnulls[off],
> + values, isnulls, nvalues));
>
> If the 'while (off >= 1)' loop exits without modifying off, is the above assertion always true (can boundinfo->datums[off] be accessed without checking bound) ?
>
> Cheers

Attachment Content-Type Size
v7-0001-multi-column-list-partitioning.patch application/octet-stream 196.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2021-11-02 13:35:48 Re: Multi-Column List Partitioning
Previous Message Robert Haas 2021-11-02 12:45:57 Re: removing global variable ThisTimeLineID