Quick Links

Re: Multi-Column List Partitioning

From:	Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To:	Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc:	Amit Langote <amitlangote09(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Multi-Column List Partitioning
Date:	2021-10-02 20:23:26
Message-ID:	CAMm1aWbadRrY91SDkuOM03pcLNrhLoRVZysnjTUfasBmD-7+6Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> Thanks Rajkumar for testing.
>
> Here's a v2 of the delta patch that should fix both of these test
> failures. As I mentioned in my last reply, my delta patch fixed what
> I think were problems in Nitin's v3 patch but were not complete by
> themselves. Especially, I hadn't bothered to investigate various /*
> TODO: handle multi-column list partitioning */ sites to deal with my
> own changes.

Thanks Rajkumar for testing and Thank you Amit for working on v2 of
the delta patch. Actually I had done the code changes related to
partition-wise join and I was in the middle of fixing the review
comments, So I could not share the patch. Anyways thanks for your
efforts.

> I noticed that multi-column list partitions containing NULLs don't
> work correctly with partition pruning yet.
>
> create table p0 (a int, b text, c bool) partition by list (a, b, c);
> create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
> create table p02 partition of p0 for values in ((1, NULL, false));
> explain select * from p0 where a is null;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> Filter: (a IS NULL)
> (2 rows)
>
> In the attached updated version, I've dealt with some of those such
> that at least the existing cases exercising partition pruning and
> partition wise joins now pass.

wrt partition pruning, I have checked the output of the above case
with the v2 version of the delta patch and without that. The output
remains same. Kindly let me know if I am missing something. But I feel
the above output is correct as the partition p01 is the only partition
which contains NULL value for column a, hence it is showing "Seq scan
on p01" in the output. Kindly correct me if I am wrong. I feel the
code changes related to 'null_keys' is not required, hence not
incorporated that in the attached patch.

wrt partition-wise join, I had run the regression test (with new cases
related to partition-wise join) on v2 of the delta patch and observed
the crash. Hence I have not incorporated the partition-wise join
related code from v2 of delta patch to main v4 patch. Instead I have
added the partition-wise join related code done by me in the attached
patch. Please share your thoughts and if possible we can improvise the
code. Rest of the changes looks good to me and I have incorporated
that in the attached patch.

> I guess that may be due to the following newly added code being incomplete:
> Maybe this function needs to return a "bitmapset" of indexes, because
> multiple partitions can now contain NULL values.

I feel this function is not required at all as we are not separating
the non null and null partitions now. Removed in the attached patch.
Also removed the "scan_null' variable from the structure
"PruneStepResult" and cleaned up the corresponding code blocks.

> This function name may be too generic. Given that it is specific to
> implementing list bound de-duplication, maybe the following signature
> is more appropriate:
>
> static bool
> checkListBoundDuplicated(List *list_bounds, List *new_bound)

Yes. The function name looks more generic. How about using
"isListBoundDuplicated()"? I have used this name in the patch. Please
let me know if that does not look correct.

> Also, better if the function comment mentions those parameter names, like:
>
> "Returns TRUE if the list bound element 'new_bound' is already present
> in the target list 'list_bounds', FALSE otherwise."

Fixed.

> +/*
> + * transformPartitionListBounds
> + *
> + * Converts the expressions of list partition bounds from the raw grammar
> + * representation.
>
> A sentence about the result format would be helpful, like:
>
> The result is a List of Lists of Const nodes to account for the
> partition key possibly containing more than one column.

Fixed.

> + int i = 0;
> + int j = 0;
>
> Better to initialize such loop counters closer to the loop.

Fixed in all the places.

> + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> + colname[i] = get_attname(RelationGetRelid(parent),
> + key->partattrs[i], false);
>
> The palloc in the 1st statement is wasteful, because the 2nd statement
> overwrites its pointer by the pointer to the string palloc'd by
> get_attname().

Removed the 1st statement as it is not required.

> + ListCell *cell2 = NULL;
>
> No need to explicitly initialize the loop variable.

Fixed in all the places.

> + RowExpr *rowexpr = NULL;
> +
> + if (!IsA(expr, RowExpr))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> + errmsg("Invalid list bound specification"),
> + parser_errposition(pstate, exprLocation((Node
> *) spec))));
> +
> + rowexpr = (RowExpr *) expr;
>
> It's okay to assign rowexpr at the top here instead of the dummy
> NULL-initialization and write the condition as:
>
> if (!IsA(rowexpr, RowExpr))

Fixed.

> + if (isDuplicate)
> + continue;
> +
> + result = lappend(result, values);
>
> I can see you copied this style from the existing code, but how about
> writing this simply as:
>
> if (!isDuplicate)
> result = lappend(result, values);

This looks good. I have changed in the patch.

> -/* One value coming from some (index'th) list partition */
> +/* One bound of a list partition */
> typedef struct PartitionListValue
> {
> int index;
> - Datum value;
> + Datum *values;
> + bool *isnulls;
> } PartitionListValue;
>
> Given that this is a locally-defined struct, I wonder if it makes
> sense to rename the struct while we're at it. Call it, say,
> PartitionListBound?

Yes. PartitionListBound looks more appropriate and it also matches the
similar structures of the other partition strategies.

> Also, please keep part of the existing comment that says that the
> bound belongs to index'th partition.

Retained the old comment.

> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
> */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."

Fixed.

> - PartitionListValue *all_values;
> + PartitionListValue **all_values;
> ...
> - all_values = (PartitionListValue *)
> - palloc(ndatums * sizeof(PartitionListValue));
> + ndatums = get_list_datum_count(boundspecs, nparts);
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type. No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> + all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
> qsort_partition_list_value_cmp, (void *) key);
>
> + int orig_index = all_values[i]->index;
> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>
> Missing a newline between these two statements.

Fixed. Made necessary changes to keep the intent of existing code.

> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> if (b1->nindexes != b2->nindexes)
> return false;
>
> - if (b1->null_index != b2->null_index)
> + if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.

As mentioned earlier, removed the functionality of
get_partition_bound_null_index(), hence the above condition is not
required and removed.

> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> * context. datumIsEqual() should be simple enough to be
> * safe.
> */
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.

Yes. May be we could have ignored the above code stanza if we would
have comparing the null indexes using get_partition_bound_null_index()
in the beginning of the function. But hence we are not separating the
non null partitions and null partitions, I would like to keep the
logic in the inner loop as we are doing it for non null bound values
in the above code stanza, just to give a feel that null bound values
are also handled the same way as non null values. Please correct me if
I am wrong.

> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
> (src->nulls == NULL || !src->isnulls[i][j])

Fixed.

> In check_new_partition_bound():
>
> + Datum *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> + bool *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
> Datum values[PARTITION_MAX_KEYS];
> bool isnulls[PARTITION_MAX_KEYS];

Thanks for the suggestion. I have changed as above.

> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea. I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex. Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.

Actually I had written a separate function because of the complexity.
Now I have understood that since the objective is same, it should be
done in a single function irrespective of complexity.

> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases. Changes included in
> the delta patch. The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would. Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column. However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly. I guess that's a TODO you're going to
> deal with at some point anyway. :)

Thank you very much for your efforts. The changes looks good to me and
I have incorporated these changes in the attached patch.

I have completed the coding for all the TODOs and hence removed in the
patch. The naming conventions used for function/variable names varies
across the files. Some places it is like 'namesLikeThis' and in some
place it is like 'names_like_this'. I have used the naming conventions
based on the surrounding styles used. I am happy to change those if
required.

I have verified 'make check' with the attached patch and it is working fine.

Thanks & Regards,
Nitin Jadhav

On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
<rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
>
> On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing with below errors.
>
> --inherit.sql is failing with error :"ERROR: negative bitmapset member not allowed"
> update mlparted_tab mlp set c = 'xxx'
> from
> (select a from some_tab union all select a+1 from some_tab) ss (a)
> where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
> ERROR: negative bitmapset member not allowed
>
> --partition_join.sql is crashing with enable_partitionwise_join set to true.
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001', '0003');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008', '0009');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
> ANALYZE plt1_adv;
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002', '0003');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004', '0006');
> CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007', '0009');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
> ANALYZE plt2_adv;
> -- inner join
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
>
> --stack-trace
> Core was generated by `postgres: edb regression [local] EXPLAIN '.
> Program terminated with signal 6, Aborted.
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
> #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
> #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
> at assert.c:69
> #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1) at partbounds.c:1957
> #4 0x00000000008919bd in merge_list_bounds (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1529
> #5 0x00000000008910de in partition_bounds_merge (partnatts=1, partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0, inner_rel=0x1922938, jointype=JOIN_INNER,
> outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at partbounds.c:1223
> #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
> parts2=0x7fffd67751a8) at joinrels.c:1644
> #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
> at joinrels.c:1402
> #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
> at joinrels.c:926
> #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
> #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0, old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at joinrels.c:312
> #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2) at joinrels.c:123
> #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0, levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
> #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:2951
> #14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538) at allpaths.c:228
> #15 0x000000000084491d in query_planner (root=0x1a19ed0, qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at planmain.c:276
> #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0, tuple_fraction=0) at planner.c:1447
> #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8, parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:1025
> #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:406
> #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at planner.c:277
> #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
> query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0) at postgres.c:847
> #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290, cursorOptions=2048, into=0x0, es=0x19b36f0,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> params=0x0, queryEnv=0x0) at explain.c:397
> #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
> #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
> #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
> queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
> #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40, pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378, qc=0x7fffd6775f90) at pquery.c:1147
> #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40, isTopLevel=true) at pquery.c:1026
> #27 0x000000000097ed11 in PortalRun (portal=0x1893b40, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8, altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
> #28 0x0000000000978aa5 in exec_simple_query (
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>>
>> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> > <nitinjadhavpostgres(at)gmail(dot)com> wrote:
>> > > The attached patch also fixes the above comments.
>> >
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> > QUERY PLAN
>> > --------------------------------------------------------
>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> > Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > I guess that may be due to the following newly added code being incomplete:
>> >
>> > +/*
>> > + * get_partition_bound_null_index
>> > + *
>> > + * Returns the partition index of the partition bound which accepts NULL.
>> > + */
>> > +int
>> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> > +{
>> > + int i = 0;
>> > + int j = 0;
>> > +
>> > + if (!boundinfo->isnulls)
>> > + return -1;
>> >
>> > - if (!val->constisnull)
>> > - count++;
>> > + for (i = 0; i < boundinfo->ndatums; i++)
>> > + {
>> > + //TODO: Handle for multi-column cases
>> > + for (j = 0; j < 1; j++)
>> > + {
>> > + if (boundinfo->isnulls[i][j])
>> > + return boundinfo->indexes[i];
>> > }
>> > }
>> >
>> > + return -1;
>> > +}
>> >
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>> >
>> > Some other issues I noticed and suggestions for improvement:
>> >
>> > +/*
>> > + * checkForDuplicates
>> > + *
>> > + * Returns TRUE if the list bound element is already present in the list of
>> > + * list bounds, FALSE otherwise.
>> > + */
>> > +static bool
>> > +checkForDuplicates(List *source, List *searchElem)
>> >
>> > This function name may be too generic. Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >
>> > Also, better if the function comment mentions those parameter names, like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>> >
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>> >
>> > + int i = 0;
>> > + int j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>> >
>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > + colname[i] = get_attname(RelationGetRelid(parent),
>> > + key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>> >
>> > + ListCell *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>> >
>> > + RowExpr *rowexpr = NULL;
>> > +
>> > + if (!IsA(expr, RowExpr))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > + errmsg("Invalid list bound specification"),
>> > + parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > + rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> > if (!IsA(rowexpr, RowExpr))
>> >
>> > + if (isDuplicate)
>> > + continue;
>> > +
>> > + result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> > if (!isDuplicate)
>> > result = lappend(result, values);
>> >
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> > typedef struct PartitionListValue
>> > {
>> > int index;
>> > - Datum value;
>> > + Datum *values;
>> > + bool *isnulls;
>> > } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it. Call it, say,
>> > PartitionListBound?
>> >
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>> >
>> > Will send more comments in a bit...
>>
>> + * partition_bound_accepts_nulls
>> + *
>> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> */
>>
>> I suggest slight rewording, as follows:
>>
>> "Returns TRUE if any of the partition bounds contains a NULL value,
>> FALSE otherwise."
>>
>> - PartitionListValue *all_values;
>> + PartitionListValue **all_values;
>> ...
>> - all_values = (PartitionListValue *)
>> - palloc(ndatums * sizeof(PartitionListValue));
>> + ndatums = get_list_datum_count(boundspecs, nparts);
>> + all_values = (PartitionListValue **)
>> + palloc(ndatums * sizeof(PartitionListValue *));
>>
>> I don't see the need to redefine all_values's pointer type. No need
>> to palloc PartitionListValue repeatedly for every datum as done
>> further down as follows:
>>
>> + all_values[j] = (PartitionListValue *)
>> palloc(sizeof(PartitionListValue));
>>
>> You do need the following two though:
>>
>> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> sizeof(Datum));
>> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> sizeof(bool));
>>
>> If you change the above the way I suggest, you'd also need to revert
>> the following change:
>>
>> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> qsort_partition_list_value_cmp, (void *) key);
>>
>> + int orig_index = all_values[i]->index;
>> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts * sizeof(Datum));
>>
>> Missing a newline between these two statements.
>>
>> BTW, I noticed that the boundDatums variable is no longer used in
>> create_list_bounds. I traced back its origin and found that a recent
>> commit 53d86957e98 introduced it to implement an idea to reduce the
>> finer-grained pallocs that were being done in create_list_bounds(). I
>> don't think that this patch needs to throw away that work. You can
>> make it work as the attached delta patch that applies on top of v3.
>> Please check.
>>
>> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> if (b1->nindexes != b2->nindexes)
>> return false;
>>
>> - if (b1->null_index != b2->null_index)
>> + if (get_partition_bound_null_index(b1) !=
>> get_partition_bound_null_index(b2))
>>
>> As mentioned in the last message, this bit in partition_bounds_equal()
>> needs to be comparing "bitmapsets" of null bound indexes, that is
>> after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> But...
>>
>> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> *parttyplen, bool *parttypbyval,
>> * context. datumIsEqual() should be simple enough to be
>> * safe.
>> */
>> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> + if (b1->isnulls)
>> + b1_isnull = b1->isnulls[i][j];
>> + if (b2->isnulls)
>> + b2_isnull = b2->isnulls[i][j];
>> +
>> + /*
>> + * If any of the partition bound has NULL value, then check
>> + * equality for the NULL value instead of comparing the datums
>> + * as it does not contain valid value in case of NULL.
>> + */
>> + if (b1_isnull || b2_isnull)
>> + {
>> + if (b1_isnull != b2_isnull)
>> + return false;
>> + }
>>
>> ...if you have this in the main loop, I don't think we need the above
>> code stanza which appears to implement a short-cut for this long-form
>> logic.
>>
>> + (key->strategy != PARTITION_STRATEGY_LIST ||
>> + !src->isnulls[i][j]))
>>
>> I think it's better to write this condition as follows just like the
>> accompanying condition involving src->kind:
>>
>> (src->nulls == NULL || !src->isnulls[i][j])
>>
>> (Skipped looking at merge_list_bounds() and related changes for now as
>> I see a lot of TODOs remain to be done.)
>>
>> In check_new_partition_bound():
>>
>> + Datum *values = (Datum *)
>> palloc0(key->partnatts * sizeof(Datum));
>> + bool *isnulls = (bool *)
>> palloc0(key->partnatts * sizeof(bool));
>>
>> Doesn't seem like a bad idea to declare these as:
>>
>> Datum values[PARTITION_MAX_KEYS];
>> bool isnulls[PARTITION_MAX_KEYS];
>>
>>
>> I looked at get_qual_for_list_multi_column() and immediately thought
>> that it may be a bad idea. I think it's better to integrate the logic
>> for multi-column case into the existing function even if that makes
>> the function appear more complex. Having two functions with the same
>> goal and mostly the same code is not a good idea mainly because it
>> becomes a maintenance burden.
>>
>> I have attempted a rewrite such that get_qual_for_list() now handles
>> both the single-column and multi-column cases. Changes included in
>> the delta patch. The patch updates some outputs of the newly added
>> tests for multi-column list partitions, because the new code emits the
>> IS NOT NULL tests a bit differently than
>> get_qual_for_list_mutli_column() would. Notably, the old approach
>> would emit IS NOT NULL for every non-NULL datum matched to a given
>> column, not just once for the column. However, the patch makes a few
>> other tests fail, mainly because I had to fix
>> partition_bound_accepts_nulls() to handle the multi-column case,
>> though didn't bother to update all callers of it to also handle the
>> multi-column case correctly. I guess that's a TODO you're going to
>> deal with at some point anyway. :)
>>
>> I still have more than half of v3 left to look at, so will continue
>> looking. In the meantime, please check the changes I suggested,
>> including the delta patch, and let me know your thoughts.
>>
>> --
>> Amit Langote
>> EDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v4-0001-multi-column-list-partitioning.patch	application/octet-stream	192.8 KB

In response to

Re: Multi-Column List Partitioning at 2021-09-13 10:17:36 from Rajkumar Raghuwanshi

Responses

Re: Multi-Column List Partitioning at 2021-10-07 10:34:52 from Rajkumar Raghuwanshi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2021-10-02 20:41:30	Re: Adding CI to our tree
Previous Message	Daniel Gustafsson	2021-10-02 20:18:38	Re: Adding CI to our tree