Re: Multi-Column List Partitioning

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-10-07 12:34:16
Message-ID: CAMm1aWav9tvpqx3=Tvk-LedQHE9uh7HbECq6k61wna_zTupYHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Rajkumar for testing.

> I think it should throw an error as the partition by list has only 1
column but we are giving 2 values.

I also agree that it should throw an error in the above case. Fixed the
issue in the attached patch. Also added related test cases to the
regression test suite.

> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’
instead of ('0001','0001').

Now throwing errors in the initial stage, this case doesn't arise.

Please share if you find any other issues.

Thanks & Regards,
Nitin Jadhav

On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi <
rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:

> Thanks Nitin,
>
> v4 patches applied cleanly and make check is passing now. While testing
> further I observed that if multiple values are given for a single
> column list partition it is not giving error instead it is changing values
> itself. Please find the example below.
>
> postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
> CREATE TABLE
> postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
> (('0001','0001'),('0002','0002'));
> CREATE TABLE
> postgres=# \d+ plt1;
> Partitioned table "public.plt1"
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
>
> --------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
> a | integer | | | | plain |
> | |
> b | character varying | | | | extended |
> | |
> Partition key: LIST (b)
> Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')
>
> I think it should throw an error as the partition by list has only 1
> column but we are giving 2 values.
> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’
> instead of ('0001','0001').
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
>
> On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
> wrote:
>
>> > > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is
>> failing with below errors.
>> >
>> > Thanks Rajkumar for testing.
>> >
>> > Here's a v2 of the delta patch that should fix both of these test
>> > failures. As I mentioned in my last reply, my delta patch fixed what
>> > I think were problems in Nitin's v3 patch but were not complete by
>> > themselves. Especially, I hadn't bothered to investigate various /*
>> > TODO: handle multi-column list partitioning */ sites to deal with my
>> > own changes.
>>
>> Thanks Rajkumar for testing and Thank you Amit for working on v2 of
>> the delta patch. Actually I had done the code changes related to
>> partition-wise join and I was in the middle of fixing the review
>> comments, So I could not share the patch. Anyways thanks for your
>> efforts.
>>
>> > I noticed that multi-column list partitions containing NULLs don't
>> > work correctly with partition pruning yet.
>> >
>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1,
>> false));
>> > create table p02 partition of p0 for values in ((1, NULL, false));
>> > explain select * from p0 where a is null;
>> > QUERY PLAN
>> > --------------------------------------------------------
>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> > Filter: (a IS NULL)
>> > (2 rows)
>> >
>> > In the attached updated version, I've dealt with some of those such
>> > that at least the existing cases exercising partition pruning and
>> > partition wise joins now pass.
>>
>> wrt partition pruning, I have checked the output of the above case
>> with the v2 version of the delta patch and without that. The output
>> remains same. Kindly let me know if I am missing something. But I feel
>> the above output is correct as the partition p01 is the only partition
>> which contains NULL value for column a, hence it is showing "Seq scan
>> on p01" in the output. Kindly correct me if I am wrong. I feel the
>> code changes related to 'null_keys' is not required, hence not
>> incorporated that in the attached patch.
>>
>> wrt partition-wise join, I had run the regression test (with new cases
>> related to partition-wise join) on v2 of the delta patch and observed
>> the crash. Hence I have not incorporated the partition-wise join
>> related code from v2 of delta patch to main v4 patch. Instead I have
>> added the partition-wise join related code done by me in the attached
>> patch. Please share your thoughts and if possible we can improvise the
>> code. Rest of the changes looks good to me and I have incorporated
>> that in the attached patch.
>>
>>
>> > I guess that may be due to the following newly added code being
>> incomplete:
>> > Maybe this function needs to return a "bitmapset" of indexes, because
>> > multiple partitions can now contain NULL values.
>>
>> I feel this function is not required at all as we are not separating
>> the non null and null partitions now. Removed in the attached patch.
>> Also removed the "scan_null' variable from the structure
>> "PruneStepResult" and cleaned up the corresponding code blocks.
>>
>>
>> > This function name may be too generic. Given that it is specific to
>> > implementing list bound de-duplication, maybe the following signature
>> > is more appropriate:
>> >
>> > static bool
>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>>
>> Yes. The function name looks more generic. How about using
>> "isListBoundDuplicated()"? I have used this name in the patch. Please
>> let me know if that does not look correct.
>>
>>
>> > Also, better if the function comment mentions those parameter names,
>> like:
>> >
>> > "Returns TRUE if the list bound element 'new_bound' is already present
>> > in the target list 'list_bounds', FALSE otherwise."
>>
>> Fixed.
>>
>>
>> > +/*
>> > + * transformPartitionListBounds
>> > + *
>> > + * Converts the expressions of list partition bounds from the raw
>> grammar
>> > + * representation.
>> >
>> > A sentence about the result format would be helpful, like:
>> >
>> > The result is a List of Lists of Const nodes to account for the
>> > partition key possibly containing more than one column.
>>
>> Fixed.
>>
>>
>> > + int i = 0;
>> > + int j = 0;
>> >
>> > Better to initialize such loop counters closer to the loop.
>>
>> Fixed in all the places.
>>
>>
>> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
>> > + colname[i] = get_attname(RelationGetRelid(parent),
>> > + key->partattrs[i], false);
>> >
>> > The palloc in the 1st statement is wasteful, because the 2nd statement
>> > overwrites its pointer by the pointer to the string palloc'd by
>> > get_attname().
>>
>> Removed the 1st statement as it is not required.
>>
>>
>> > + ListCell *cell2 = NULL;
>> >
>> > No need to explicitly initialize the loop variable.
>>
>> Fixed in all the places.
>>
>>
>> > + RowExpr *rowexpr = NULL;
>> > +
>> > + if (!IsA(expr, RowExpr))
>> > + ereport(ERROR,
>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > + errmsg("Invalid list bound specification"),
>> > + parser_errposition(pstate, exprLocation((Node
>> > *) spec))));
>> > +
>> > + rowexpr = (RowExpr *) expr;
>> >
>> > It's okay to assign rowexpr at the top here instead of the dummy
>> > NULL-initialization and write the condition as:
>> >
>> > if (!IsA(rowexpr, RowExpr))
>>
>> Fixed.
>>
>>
>> > + if (isDuplicate)
>> > + continue;
>> > +
>> > + result = lappend(result, values);
>> >
>> > I can see you copied this style from the existing code, but how about
>> > writing this simply as:
>> >
>> > if (!isDuplicate)
>> > result = lappend(result, values);
>>
>> This looks good. I have changed in the patch.
>>
>>
>> > -/* One value coming from some (index'th) list partition */
>> > +/* One bound of a list partition */
>> > typedef struct PartitionListValue
>> > {
>> > int index;
>> > - Datum value;
>> > + Datum *values;
>> > + bool *isnulls;
>> > } PartitionListValue;
>> >
>> > Given that this is a locally-defined struct, I wonder if it makes
>> > sense to rename the struct while we're at it. Call it, say,
>> > PartitionListBound?
>>
>> Yes. PartitionListBound looks more appropriate and it also matches the
>> similar structures of the other partition strategies.
>>
>> > Also, please keep part of the existing comment that says that the
>> > bound belongs to index'th partition.
>>
>> Retained the old comment.
>>
>>
>> > + * partition_bound_accepts_nulls
>> > + *
>> > + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> > */
>> >
>> > I suggest slight rewording, as follows:
>> >
>> > "Returns TRUE if any of the partition bounds contains a NULL value,
>> > FALSE otherwise."
>>
>> Fixed.
>>
>>
>> > - PartitionListValue *all_values;
>> > + PartitionListValue **all_values;
>> > ...
>> > - all_values = (PartitionListValue *)
>> > - palloc(ndatums * sizeof(PartitionListValue));
>> > + ndatums = get_list_datum_count(boundspecs, nparts);
>> > + all_values = (PartitionListValue **)
>> > + palloc(ndatums * sizeof(PartitionListValue *));
>> >
>> > I don't see the need to redefine all_values's pointer type. No need
>> > to palloc PartitionListValue repeatedly for every datum as done
>> > further down as follows:
>> >
>> > + all_values[j] = (PartitionListValue *)
>> > palloc(sizeof(PartitionListValue));
>> >
>> > You do need the following two though:
>> >
>> > + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> > sizeof(Datum));
>> > + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> > sizeof(bool));
>> >
>> > If you change the above the way I suggest, you'd also need to revert
>> > the following change:
>> >
>> > - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> > + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> > qsort_partition_list_value_cmp, (void *) key);
>> >
>> > + int orig_index = all_values[i]->index;
>> > + boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
>> sizeof(Datum));
>> >
>> > Missing a newline between these two statements.
>>
>> Fixed. Made necessary changes to keep the intent of existing code.
>>
>>
>> > @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> > *parttyplen, bool *parttypbyval,
>> > if (b1->nindexes != b2->nindexes)
>> > return false;
>> >
>> > - if (b1->null_index != b2->null_index)
>> > + if (get_partition_bound_null_index(b1) !=
>> > get_partition_bound_null_index(b2))
>> >
>> > As mentioned in the last message, this bit in partition_bounds_equal()
>> > needs to be comparing "bitmapsets" of null bound indexes, that is
>> > after fixing get_partition_bound_null_index() as previously mentioned.
>>
>> As mentioned earlier, removed the functionality of
>> get_partition_bound_null_index(), hence the above condition is not
>> required and removed.
>>
>> > But...
>> >
>> > @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> > *parttyplen, bool *parttypbyval,
>> > * context. datumIsEqual() should be simple enough to be
>> > * safe.
>> > */
>> > - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> > + if (b1->isnulls)
>> > + b1_isnull = b1->isnulls[i][j];
>> > + if (b2->isnulls)
>> > + b2_isnull = b2->isnulls[i][j];
>> > +
>> > + /*
>> > + * If any of the partition bound has NULL value, then
>> check
>> > + * equality for the NULL value instead of comparing the
>> datums
>> > + * as it does not contain valid value in case of NULL.
>> > + */
>> > + if (b1_isnull || b2_isnull)
>> > + {
>> > + if (b1_isnull != b2_isnull)
>> > + return false;
>> > + }
>> >
>> > ...if you have this in the main loop, I don't think we need the above
>> > code stanza which appears to implement a short-cut for this long-form
>> > logic.
>>
>> Yes. May be we could have ignored the above code stanza if we would
>> have comparing the null indexes using get_partition_bound_null_index()
>> in the beginning of the function. But hence we are not separating the
>> non null partitions and null partitions, I would like to keep the
>> logic in the inner loop as we are doing it for non null bound values
>> in the above code stanza, just to give a feel that null bound values
>> are also handled the same way as non null values. Please correct me if
>> I am wrong.
>>
>>
>> > + (key->strategy != PARTITION_STRATEGY_LIST ||
>> > + !src->isnulls[i][j]))
>> >
>> > I think it's better to write this condition as follows just like the
>> > accompanying condition involving src->kind:
>> >
>> > (src->nulls == NULL || !src->isnulls[i][j])
>>
>> Fixed.
>>
>>
>> > In check_new_partition_bound():
>> >
>> > + Datum *values = (Datum *)
>> > palloc0(key->partnatts * sizeof(Datum));
>> > + bool *isnulls = (bool *)
>> > palloc0(key->partnatts * sizeof(bool));
>> >
>> > Doesn't seem like a bad idea to declare these as:
>> >
>> > Datum values[PARTITION_MAX_KEYS];
>> > bool isnulls[PARTITION_MAX_KEYS];
>>
>> Thanks for the suggestion. I have changed as above.
>>
>> > I looked at get_qual_for_list_multi_column() and immediately thought
>> > that it may be a bad idea. I think it's better to integrate the logic
>> > for multi-column case into the existing function even if that makes
>> > the function appear more complex. Having two functions with the same
>> > goal and mostly the same code is not a good idea mainly because it
>> > becomes a maintenance burden.
>>
>> Actually I had written a separate function because of the complexity.
>> Now I have understood that since the objective is same, it should be
>> done in a single function irrespective of complexity.
>>
>> > I have attempted a rewrite such that get_qual_for_list() now handles
>> > both the single-column and multi-column cases. Changes included in
>> > the delta patch. The patch updates some outputs of the newly added
>> > tests for multi-column list partitions, because the new code emits the
>> > IS NOT NULL tests a bit differently than
>> > get_qual_for_list_mutli_column() would. Notably, the old approach
>> > would emit IS NOT NULL for every non-NULL datum matched to a given
>> > column, not just once for the column. However, the patch makes a few
>> > other tests fail, mainly because I had to fix
>> > partition_bound_accepts_nulls() to handle the multi-column case,
>> > though didn't bother to update all callers of it to also handle the
>> > multi-column case correctly. I guess that's a TODO you're going to
>> > deal with at some point anyway. :)
>>
>> Thank you very much for your efforts. The changes looks good to me and
>> I have incorporated these changes in the attached patch.
>>
>> I have completed the coding for all the TODOs and hence removed in the
>> patch. The naming conventions used for function/variable names varies
>> across the files. Some places it is like 'namesLikeThis' and in some
>> place it is like 'names_like_this'. I have used the naming conventions
>> based on the surrounding styles used. I am happy to change those if
>> required.
>>
>> I have verified 'make check' with the attached patch and it is working
>> fine.
>>
>>
>> Thanks & Regards,
>> Nitin Jadhav
>>
>>
>> On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
>> <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
>> >
>> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is
>> failing with below errors.
>> >
>> > --inherit.sql is failing with error :"ERROR: negative bitmapset member
>> not allowed"
>> > update mlparted_tab mlp set c = 'xxx'
>> > from
>> > (select a from some_tab union all select a+1 from some_tab) ss (a)
>> > where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
>> > ERROR: negative bitmapset member not allowed
>> >
>> > --partition_join.sql is crashing with enable_partitionwise_join set to
>> true.
>> > CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
>> > CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001',
>> '0003');
>> > CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004',
>> '0006');
>> > CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008',
>> '0009');
>> > INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
>> generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
>> > ANALYZE plt1_adv;
>> > CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
>> > CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002',
>> '0003');
>> > CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004',
>> '0006');
>> > CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007',
>> '0009');
>> > INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
>> generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
>> > ANALYZE plt2_adv;
>> > -- inner join
>> > EXPLAIN (COSTS OFF)
>> > SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2
>> ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
>> > server closed the connection unexpectedly
>> > This probably means the server terminated abnormally
>> > before or while processing the request.
>> > connection to server was lost
>> >
>> >
>> > --stack-trace
>> > Core was generated by `postgres: edb regression [local] EXPLAIN
>> '.
>> > Program terminated with signal 6, Aborted.
>> > #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
>> > Missing separate debuginfos, use: debuginfo-install
>> glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
>> krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
>> libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64
>> openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64
>> zlib-1.2.7-17.el7.x86_64
>> > (gdb) bt
>> > #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
>> > #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
>> > #2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10
>> "part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe
>> "partbounds.c", lineNumber=1957)
>> > at assert.c:69
>> > #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0,
>> part_index=-1) at partbounds.c:1957
>> > #4 0x00000000008919bd in merge_list_bounds (partnatts=1,
>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
>> inner_rel=0x1922938, jointype=JOIN_INNER,
>> > outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
>> partbounds.c:1529
>> > #5 0x00000000008910de in partition_bounds_merge (partnatts=1,
>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
>> inner_rel=0x1922938, jointype=JOIN_INNER,
>> > outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
>> partbounds.c:1223
>> > #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0,
>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
>> parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
>> > parts2=0x7fffd67751a8) at joinrels.c:1644
>> > #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0,
>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
>> parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
>> > at joinrels.c:1402
>> > #8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0,
>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0,
>> restrictlist=0x1ab3318)
>> > at joinrels.c:926
>> > #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0,
>> rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
>> > #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0,
>> old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at
>> joinrels.c:312
>> > #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0,
>> level=2) at joinrels.c:123
>> > #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0,
>> levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
>> > #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0,
>> joinlist=0x199d538) at allpaths.c:2951
>> > #14 0x000000000080816b in make_one_rel (root=0x1a19ed0,
>> joinlist=0x199d538) at allpaths.c:228
>> > #15 0x000000000084491d in query_planner (root=0x1a19ed0,
>> qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at
>> planmain.c:276
>> > #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0,
>> tuple_fraction=0) at planner.c:1447
>> > #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8,
>> parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
>> planner.c:1025
>> > #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>> boundParams=0x0) at planner.c:406
>> > #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>> boundParams=0x0) at planner.c:277
>> > #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>> boundParams=0x0) at postgres.c:847
>> > #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290,
>> cursorOptions=2048, into=0x0, es=0x19b36f0,
>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>> > params=0x0, queryEnv=0x0) at explain.c:397
>> > #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410,
>> stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
>> > #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>> > readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
>> queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
>> > #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>> > readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
>> queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
>> > #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40,
>> pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378,
>> qc=0x7fffd6775f90) at pquery.c:1147
>> > #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40,
>> isTopLevel=true) at pquery.c:1026
>> > #27 0x000000000097ed11 in PortalRun (portal=0x1893b40,
>> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8,
>> altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
>> > #28 0x0000000000978aa5 in exec_simple_query (
>> >
>> > Thanks & Regards,
>> > Rajkumar Raghuwanshi
>> >
>> >
>> > On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09(at)gmail(dot)com>
>> wrote:
>> >>
>> >> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09(at)gmail(dot)com>
>> wrote:
>> >> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>> >> > <nitinjadhavpostgres(at)gmail(dot)com> wrote:
>> >> > > The attached patch also fixes the above comments.
>> >> >
>> >> > I noticed that multi-column list partitions containing NULLs don't
>> >> > work correctly with partition pruning yet.
>> >> >
>> >> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>> >> > create table p01 partition of p0 for values in ((1, 1, true), (NULL,
>> 1, false));
>> >> > create table p02 partition of p0 for values in ((1, NULL, false));
>> >> > explain select * from p0 where a is null;
>> >> > QUERY PLAN
>> >> > --------------------------------------------------------
>> >> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>> >> > Filter: (a IS NULL)
>> >> > (2 rows)
>> >> >
>> >> > I guess that may be due to the following newly added code being
>> incomplete:
>> >> >
>> >> > +/*
>> >> > + * get_partition_bound_null_index
>> >> > + *
>> >> > + * Returns the partition index of the partition bound which accepts
>> NULL.
>> >> > + */
>> >> > +int
>> >> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>> >> > +{
>> >> > + int i = 0;
>> >> > + int j = 0;
>> >> > +
>> >> > + if (!boundinfo->isnulls)
>> >> > + return -1;
>> >> >
>> >> > - if (!val->constisnull)
>> >> > - count++;
>> >> > + for (i = 0; i < boundinfo->ndatums; i++)
>> >> > + {
>> >> > + //TODO: Handle for multi-column cases
>> >> > + for (j = 0; j < 1; j++)
>> >> > + {
>> >> > + if (boundinfo->isnulls[i][j])
>> >> > + return boundinfo->indexes[i];
>> >> > }
>> >> > }
>> >> >
>> >> > + return -1;
>> >> > +}
>> >> >
>> >> > Maybe this function needs to return a "bitmapset" of indexes, because
>> >> > multiple partitions can now contain NULL values.
>> >> >
>> >> > Some other issues I noticed and suggestions for improvement:
>> >> >
>> >> > +/*
>> >> > + * checkForDuplicates
>> >> > + *
>> >> > + * Returns TRUE if the list bound element is already present in the
>> list of
>> >> > + * list bounds, FALSE otherwise.
>> >> > + */
>> >> > +static bool
>> >> > +checkForDuplicates(List *source, List *searchElem)
>> >> >
>> >> > This function name may be too generic. Given that it is specific to
>> >> > implementing list bound de-duplication, maybe the following signature
>> >> > is more appropriate:
>> >> >
>> >> > static bool
>> >> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>> >> >
>> >> > Also, better if the function comment mentions those parameter names,
>> like:
>> >> >
>> >> > "Returns TRUE if the list bound element 'new_bound' is already
>> present
>> >> > in the target list 'list_bounds', FALSE otherwise."
>> >> >
>> >> > +/*
>> >> > + * transformPartitionListBounds
>> >> > + *
>> >> > + * Converts the expressions of list partition bounds from the raw
>> grammar
>> >> > + * representation.
>> >> >
>> >> > A sentence about the result format would be helpful, like:
>> >> >
>> >> > The result is a List of Lists of Const nodes to account for the
>> >> > partition key possibly containing more than one column.
>> >> >
>> >> > + int i = 0;
>> >> > + int j = 0;
>> >> >
>> >> > Better to initialize such loop counters closer to the loop.
>> >> >
>> >> > + colname[i] = (char *) palloc0(NAMEDATALEN *
>> sizeof(char));
>> >> > + colname[i] = get_attname(RelationGetRelid(parent),
>> >> > + key->partattrs[i], false);
>> >> >
>> >> > The palloc in the 1st statement is wasteful, because the 2nd
>> statement
>> >> > overwrites its pointer by the pointer to the string palloc'd by
>> >> > get_attname().
>> >> >
>> >> > + ListCell *cell2 = NULL;
>> >> >
>> >> > No need to explicitly initialize the loop variable.
>> >> >
>> >> > + RowExpr *rowexpr = NULL;
>> >> > +
>> >> > + if (!IsA(expr, RowExpr))
>> >> > + ereport(ERROR,
>> >> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> >> > + errmsg("Invalid list bound specification"),
>> >> > + parser_errposition(pstate, exprLocation((Node
>> >> > *) spec))));
>> >> > +
>> >> > + rowexpr = (RowExpr *) expr;
>> >> >
>> >> > It's okay to assign rowexpr at the top here instead of the dummy
>> >> > NULL-initialization and write the condition as:
>> >> >
>> >> > if (!IsA(rowexpr, RowExpr))
>> >> >
>> >> > + if (isDuplicate)
>> >> > + continue;
>> >> > +
>> >> > + result = lappend(result, values);
>> >> >
>> >> > I can see you copied this style from the existing code, but how about
>> >> > writing this simply as:
>> >> >
>> >> > if (!isDuplicate)
>> >> > result = lappend(result, values);
>> >> >
>> >> > -/* One value coming from some (index'th) list partition */
>> >> > +/* One bound of a list partition */
>> >> > typedef struct PartitionListValue
>> >> > {
>> >> > int index;
>> >> > - Datum value;
>> >> > + Datum *values;
>> >> > + bool *isnulls;
>> >> > } PartitionListValue;
>> >> >
>> >> > Given that this is a locally-defined struct, I wonder if it makes
>> >> > sense to rename the struct while we're at it. Call it, say,
>> >> > PartitionListBound?
>> >> >
>> >> > Also, please keep part of the existing comment that says that the
>> >> > bound belongs to index'th partition.
>> >> >
>> >> > Will send more comments in a bit...
>> >>
>> >> + * partition_bound_accepts_nulls
>> >> + *
>> >> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>> >> */
>> >>
>> >> I suggest slight rewording, as follows:
>> >>
>> >> "Returns TRUE if any of the partition bounds contains a NULL value,
>> >> FALSE otherwise."
>> >>
>> >> - PartitionListValue *all_values;
>> >> + PartitionListValue **all_values;
>> >> ...
>> >> - all_values = (PartitionListValue *)
>> >> - palloc(ndatums * sizeof(PartitionListValue));
>> >> + ndatums = get_list_datum_count(boundspecs, nparts);
>> >> + all_values = (PartitionListValue **)
>> >> + palloc(ndatums * sizeof(PartitionListValue *));
>> >>
>> >> I don't see the need to redefine all_values's pointer type. No need
>> >> to palloc PartitionListValue repeatedly for every datum as done
>> >> further down as follows:
>> >>
>> >> + all_values[j] = (PartitionListValue *)
>> >> palloc(sizeof(PartitionListValue));
>> >>
>> >> You do need the following two though:
>> >>
>> >> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
>> >> sizeof(Datum));
>> >> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
>> >> sizeof(bool));
>> >>
>> >> If you change the above the way I suggest, you'd also need to revert
>> >> the following change:
>> >>
>> >> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>> >> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>> >> qsort_partition_list_value_cmp, (void *) key);
>> >>
>> >> + int orig_index = all_values[i]->index;
>> >> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
>> sizeof(Datum));
>> >>
>> >> Missing a newline between these two statements.
>> >>
>> >> BTW, I noticed that the boundDatums variable is no longer used in
>> >> create_list_bounds. I traced back its origin and found that a recent
>> >> commit 53d86957e98 introduced it to implement an idea to reduce the
>> >> finer-grained pallocs that were being done in create_list_bounds(). I
>> >> don't think that this patch needs to throw away that work. You can
>> >> make it work as the attached delta patch that applies on top of v3.
>> >> Please check.
>> >>
>> >> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>> >> *parttyplen, bool *parttypbyval,
>> >> if (b1->nindexes != b2->nindexes)
>> >> return false;
>> >>
>> >> - if (b1->null_index != b2->null_index)
>> >> + if (get_partition_bound_null_index(b1) !=
>> >> get_partition_bound_null_index(b2))
>> >>
>> >> As mentioned in the last message, this bit in partition_bounds_equal()
>> >> needs to be comparing "bitmapsets" of null bound indexes, that is
>> >> after fixing get_partition_bound_null_index() as previously mentioned.
>> >>
>> >> But...
>> >>
>> >> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>> >> *parttyplen, bool *parttypbyval,
>> >> * context. datumIsEqual() should be simple enough to
>> be
>> >> * safe.
>> >> */
>> >> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>> >> + if (b1->isnulls)
>> >> + b1_isnull = b1->isnulls[i][j];
>> >> + if (b2->isnulls)
>> >> + b2_isnull = b2->isnulls[i][j];
>> >> +
>> >> + /*
>> >> + * If any of the partition bound has NULL value, then
>> check
>> >> + * equality for the NULL value instead of comparing
>> the datums
>> >> + * as it does not contain valid value in case of NULL.
>> >> + */
>> >> + if (b1_isnull || b2_isnull)
>> >> + {
>> >> + if (b1_isnull != b2_isnull)
>> >> + return false;
>> >> + }
>> >>
>> >> ...if you have this in the main loop, I don't think we need the above
>> >> code stanza which appears to implement a short-cut for this long-form
>> >> logic.
>> >>
>> >> + (key->strategy != PARTITION_STRATEGY_LIST ||
>> >> + !src->isnulls[i][j]))
>> >>
>> >> I think it's better to write this condition as follows just like the
>> >> accompanying condition involving src->kind:
>> >>
>> >> (src->nulls == NULL || !src->isnulls[i][j])
>> >>
>> >> (Skipped looking at merge_list_bounds() and related changes for now as
>> >> I see a lot of TODOs remain to be done.)
>> >>
>> >> In check_new_partition_bound():
>> >>
>> >> + Datum *values = (Datum *)
>> >> palloc0(key->partnatts * sizeof(Datum));
>> >> + bool *isnulls = (bool *)
>> >> palloc0(key->partnatts * sizeof(bool));
>> >>
>> >> Doesn't seem like a bad idea to declare these as:
>> >>
>> >> Datum values[PARTITION_MAX_KEYS];
>> >> bool isnulls[PARTITION_MAX_KEYS];
>> >>
>> >>
>> >> I looked at get_qual_for_list_multi_column() and immediately thought
>> >> that it may be a bad idea. I think it's better to integrate the logic
>> >> for multi-column case into the existing function even if that makes
>> >> the function appear more complex. Having two functions with the same
>> >> goal and mostly the same code is not a good idea mainly because it
>> >> becomes a maintenance burden.
>> >>
>> >> I have attempted a rewrite such that get_qual_for_list() now handles
>> >> both the single-column and multi-column cases. Changes included in
>> >> the delta patch. The patch updates some outputs of the newly added
>> >> tests for multi-column list partitions, because the new code emits the
>> >> IS NOT NULL tests a bit differently than
>> >> get_qual_for_list_mutli_column() would. Notably, the old approach
>> >> would emit IS NOT NULL for every non-NULL datum matched to a given
>> >> column, not just once for the column. However, the patch makes a few
>> >> other tests fail, mainly because I had to fix
>> >> partition_bound_accepts_nulls() to handle the multi-column case,
>> >> though didn't bother to update all callers of it to also handle the
>> >> multi-column case correctly. I guess that's a TODO you're going to
>> >> deal with at some point anyway. :)
>> >>
>> >> I still have more than half of v3 left to look at, so will continue
>> >> looking. In the meantime, please check the changes I suggested,
>> >> including the delta patch, and let me know your thoughts.
>> >>
>> >> --
>> >> Amit Langote
>> >> EDB: http://www.enterprisedb.com
>>
>

Attachment Content-Type Size
v5-0001-multi-column-list-partitioning.patch application/octet-stream 195.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-10-07 12:51:44 Re: [Patch] ALTER SYSTEM READ ONLY
Previous Message Andrei Zubkov 2021-10-07 12:31:51 Re: [PATCH] Tracking statements entry timestamp in pg_stat_statements