Re: Multi-Column List Partitioning

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Amul Sul <sulamul(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-12-13 14:36:52
Message-ID: CAE9k0P=P2rXdoEMV3-sX5KUDi=Z5CWWV=GpLsmh4n-Fue_AhvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Is this okay?

postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a );
CREATE TABLE

postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3), (4,
5, 6));
CREATE TABLE

postgres=# \d t1
Partitioned table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
Partition key: LIST (a, a, a)
Number of partitions: 1 (Use \d+ to list them.)

--

Also, getting some compiler warnings when building the source. please check.

--
With Regards,
Ashutosh Sharma.

On Mon, Dec 6, 2021 at 7:27 PM Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
wrote:

> Thank you for reviewing the patch.
>
> > partbounds.c: In function ‘get_qual_for_list.isra.18’:
> > partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> > in this function [-Wmaybe-uninitialized]
> > datumCopy(bound_info->datums[i][j],
> > ~~~~~~~~~~^~~~~~~~
> > partbounds.c:4335:21: note: ‘boundinfo’ was declared here
> > PartitionBoundInfo boundinfo;
> > ^~~~~~~~~
> > partbounds.c: In function ‘partition_bounds_merge’:
> > partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> > bool *inner_isnull;
> > ^~~~~~~~~~~~
> > partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> > bool *outer_isnull;
> > ^~~~~~~~~~~~
>
> Fixed.
>
> > This function is unnecessarily complicated, I think you can avoid
> > inner for loops; simply replace for-loop-block with "if
> > (equal(lfirst(cell), new_bound)) return true".
>
> Thank you for the suggestion. Fixed.
>
> > + char **colname = (char **) palloc0(partnatts * sizeof(char *));
> > + Oid *coltype = palloc0(partnatts * sizeof(Oid));
> > + int32 *coltypmod = palloc0(partnatts * sizeof(int));
> > + Oid *partcollation = palloc0(partnatts * sizeof(Oid));
> > +
> > This allocation seems to be worthless, read ahead.
> >
> > I think there is no need for this separate loop inside
> > transformPartitionListBounds, you can do that same in the next loop as
> > well. And instead of get_partition_col_* calling and storing, simply
> > use that directly as an argument to transformPartitionBoundValue().
>
> Yes. The loop can be avoided and content of the above loop can be
> included in the next loop but the next loop iterates over a list of
> multi column datums. For each iteration, we need the information of
> all the columns. The above data (colname, coltype, coltypmod and
> partcollation) remains same for each iteration of the loop, If we
> modify as suggested, then the function to fetch these information has
> to be called every-time. To avoid this situation I have made a
> separate loop outside which only runs as many number of columns and
> stores in a variable which can be reused later. Please let me correct
> if I am wrong.
>
> > I think this should be inside the "else" block after "!IsA(rowexpr,
> > RowExpr)" error and you can avoid IsA() check too.
>
> This is required to handle the situation when one partition key is
> mentioned and multiple values are provided in the partition bound
> specification.
>
> > Looks difficult to understand at first glance, how about the following:
> >
> > if (b1->isnulls != b2->isnulls)
> > return false;
> >
> > if (b1->isnulls)
> > {
> > if (b1->isnulls[i][j] != b2->isnulls[i][j])
> > return false;
> > if (b1->isnulls[i][j])
> > continue;
> > }
> >
> > See how range partitioning infinite values are handled. Also, place
> > this before the comment block that was added for the "!datumIsEqual()"
> > case.
>
> Fixed. I feel the 'continue' block is not required and hence removed it.
>
> > Nothing wrong with this but if we could have checked "dest->isnulls"
> > instead of "src->isnulls" would be much better.
>
> Here we are copying the data from 'src' to 'dest'. If there is no data
> in 'src', it is unnecessary to copy. Hence checking 'src'.
>
> > Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be
> unnecessary.
>
> Fixed.
>
> > Can't be a single loop?
>
> Yes. Fixed.
>
>
>
> On Fri, Dec 3, 2021 at 7:26 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> >
> > Hi,
> >
> > Few comments for v7 patch, note that I haven't been through the
> > previous discussion, if any of the review comments that has been
> > already discussed & overridden, then please ignore here too:
> >
> >
> > partbounds.c: In function ‘get_qual_for_list.isra.18’:
> > partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> > in this function [-Wmaybe-uninitialized]
> > datumCopy(bound_info->datums[i][j],
> > ~~~~~~~~~~^~~~~~~~
> > partbounds.c:4335:21: note: ‘boundinfo’ was declared here
> > PartitionBoundInfo boundinfo;
> > ^~~~~~~~~
> > partbounds.c: In function ‘partition_bounds_merge’:
> > partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> > bool *inner_isnull;
> > ^~~~~~~~~~~~
> > partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> > bool *outer_isnull;
> > ^~~~~~~~~~~~
> >
> > Got these warnings with gcc -O2 compilation.
> > ----
> >
> > /*
> > + * isListBoundDuplicated
> > + *
> > + * Returns TRUE if the list bound element 'new_bound' is already present
> > + * in the target list 'list_bounds', FALSE otherwise.
> > + */
> > +static bool
> > +isListBoundDuplicated(List *list_bounds, List *new_bound)
> > +{
> > + ListCell *cell = NULL;
> > +
> > + foreach(cell, list_bounds)
> > + {
> > + int i;
> > + List *elem = lfirst(cell);
> > + bool isDuplicate = true;
> > +
> > + Assert(list_length(elem) == list_length(new_bound));
> > +
> > + for (i = 0; i < list_length(elem); i++)
> > + {
> > + Const *value1 = castNode(Const, list_nth(elem, i));
> > + Const *value2 = castNode(Const, list_nth(new_bound, i));
> > +
> > + if (!equal(value1, value2))
> > + {
> > + isDuplicate = false;
> > + break;
> > + }
> > + }
> > +
> > + if (isDuplicate)
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> >
> > This function is unnecessarily complicated, I think you can avoid
> > inner for loops; simply replace for-loop-block with "if
> > (equal(lfirst(cell), new_bound)) return true".
> > ----
> >
> > + char **colname = (char **) palloc0(partnatts * sizeof(char *));
> > + Oid *coltype = palloc0(partnatts * sizeof(Oid));
> > + int32 *coltypmod = palloc0(partnatts * sizeof(int));
> > + Oid *partcollation = palloc0(partnatts * sizeof(Oid));
> > +
> > This allocation seems to be worthless, read ahead.
> > ----
> >
> > + for (i = 0; i < partnatts; i++)
> > + {
> > + if (key->partattrs[i] != 0)
> > + colname[i] = get_attname(RelationGetRelid(parent),
> > + key->partattrs[i], false);
> > + else
> > + {
> > + colname[i] =
> > + deparse_expression((Node *) list_nth(partexprs, j),
> > + deparse_context_for(RelationGetRelationName(parent),
> > + RelationGetRelid(parent)),
> > + false, false);
> > + ++j;
> > + }
> > +
> > + coltype[i] = get_partition_col_typid(key, i);
> > + coltypmod[i] = get_partition_col_typmod(key, i);
> > + partcollation[i] = get_partition_col_collation(key, i);
> > + }
> >
> > I think there is no need for this separate loop inside
> > transformPartitionListBounds, you can do that same in the next loop as
> > well. And instead of get_partition_col_* calling and storing, simply
> > use that directly as an argument to transformPartitionBoundValue().
> > ----
> >
> > +
> > + if (IsA(expr, RowExpr) &&
> > + partnatts != list_length(((RowExpr *) expr)->args))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> > + errmsg("Must specify exactly one value per partitioning column"),
> > + parser_errposition(pstate, exprLocation((Node *) spec))));
> > +
> >
> > I think this should be inside the "else" block after "!IsA(rowexpr,
> > RowExpr)" error and you can avoid IsA() check too.
> > ----
> >
> > - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> > + if (b1->isnulls)
> > + b1_isnull = b1->isnulls[i][j];
> > + if (b2->isnulls)
> > + b2_isnull = b2->isnulls[i][j];
> > +
> > + /*
> > + * If any of the partition bound has NULL value, then
> check
> > + * equality for the NULL value instead of comparing the
> datums
> > + * as it does not contain valid value in case of NULL.
> > + */
> > + if (b1_isnull || b2_isnull)
> > + {
> > + if (b1_isnull != b2_isnull)
> > + return false;
> > + }
> > + else if (!datumIsEqual(b1->datums[i][j],
> b2->datums[i][j],
> >
> > Looks difficult to understand at first glance, how about the following:
> >
> > if (b1->isnulls != b2->isnulls)
> > return false;
> >
> > if (b1->isnulls)
> > {
> > if (b1->isnulls[i][j] != b2->isnulls[i][j])
> > return false;
> > if (b1->isnulls[i][j])
> > continue;
> > }
> >
> > See how range partitioning infinite values are handled. Also, place
> > this before the comment block that was added for the "!datumIsEqual()"
> > case.
> > ----
> >
> > + if (src->isnulls)
> > + dest->isnulls[i] = (bool *) palloc(sizeof(bool) * natts);
> > ...
> > + if (src->isnulls)
> > + dest->isnulls[i][j] = src->isnulls[i][j];
> > +
> > Nothing wrong with this but if we could have checked "dest->isnulls"
> > instead of "src->isnulls" would be much better.
> > ----
> >
> > - if (dest->kind == NULL ||
> > - dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE)
> > + if ((dest->kind == NULL ||
> > + dest->kind[i][j] == PARTITION_RANGE_DATUM_VALUE) &&
> > + (key->strategy != PARTITION_STRATEGY_LIST ||
> > + (src->isnulls == NULL || !src->isnulls[i][j])))
> > dest->datums[i][j] = datumCopy(src->datums[i][j],
> > byval, typlen);
> > Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be
> unnecessary.
> > ----
> >
> > + for (i = 0; i < partnatts; i++)
> > + {
> > + if (outer_isnull[i])
> > + {
> > + outer_has_null = true;
> > + if (outer_map.merged_indexes[outer_index] == -1)
> > + consider_outer_null = true;
> > + }
> > + }
> > +
> > + for (i = 0; i < partnatts; i++)
> > + {
> > + if (inner_isnull[i])
> > + {
> > + inner_has_null = true;
> > + if (inner_map.merged_indexes[inner_index] == -1)
> > + consider_inner_null = true;
> > + }
> > + }
> >
> > Can't be a single loop?
> > ----
> >
> > It would be helpful if you could run pgindent on your patch if not done
> already.
> > ----
> >
> > That's all for now, I am yet to finish the complete patch reading and
> > understand the code flow, but I am out of time now.
> >
> > Regards,
> > Amul
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-12-13 14:37:27 Re: Documenting when to retry on serialization failure
Previous Message Dilip Kumar 2021-12-13 14:29:39 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints