Quick Links

Re: Multi-Column List Partitioning

From:	Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To:	Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc:	Amit Langote <amitlangote09(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Multi-Column List Partitioning
Date:	2021-10-22 09:49:39
Message-ID:	CAMm1aWZzdJW6hMO278yZFPDnQSqd0+2HGtPmxANRFcLhbgOfjg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> While testing further I got a crash with partition wise join enabled for
multi-col list partitions. please find test case & stack-trace below.

Thanks for sharing. I have fixed the issue in the attached patch.

Thanks & Regards,
Nitin Jadhav

On Mon, Oct 11, 2021 at 4:12 PM Rajkumar Raghuwanshi <
rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:

> Hi Nitin,
>
> While testing further I got a crash with partition wise join enabled for
> multi-col list partitions. please find test case & stack-trace below.
>
> SET enable_partitionwise_join TO on;
> CREATE TABLE plt1 (c varchar, d varchar) PARTITION BY LIST(c,d);
> CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
> (('0001','0001'),('0002','0002'),(NULL,NULL));
> CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN
> (('0004','0004'),('0005','0005'),('0006','0006'));
> INSERT INTO plt1 SELECT to_char(i % 11, 'FM0000'), to_char(i % 11,
> 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3,7,8,9);
> INSERT INTO plt1 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i %
> 11 IN (3);
> ANALYSE plt1;
> CREATE TABLE plt2 (c varchar, d varchar) PARTITION BY LIST(c,d);
> CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN
> (('0001','0001'),('0002','0002'));
> CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN
> (('0004','0004'),('0005','0005'),('0006','0006'));
> CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
> INSERT INTO plt2 SELECT to_char(i % 11, 'FM0000'), to_char(i % 11,
> 'FM0000') FROM generate_series(0, 500) i WHERE i % 11 NOT IN (0,10,3);
> INSERT INTO plt2 SELECT NULL,NULL FROM generate_series(0, 500) i WHERE i %
> 11 IN (3);
> ANALYSE plt2;
>
> EXPLAIN (COSTS OFF)
> SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON
> (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d =
> t3.d);
>
> postgres=# EXPLAIN (COSTS OFF)
> postgres-# SELECT t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN
> plt2 t2 ON (t1.c = t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c
> AND t2.d = t3.d);
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !?> \q
> [edb(at)localhost bin]$ gdb -q -c data/core.66926 postgres
> Reading symbols from
> /home/edb/WORK/pg_src/PG_TEMP/postgresql/inst/bin/postgres...done.
> [New LWP 66926]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `postgres: edb postgres [local] EXPLAIN
> '.
> Program terminated with signal 11, Segmentation fault.
> #0 0x000000000082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
> 1221 if (rel->pathlist == NIL)
> (gdb) bt
> #0 0x000000000082be39 in is_dummy_rel (rel=0x40) at joinrels.c:1221
> #1 0x000000000089341c in is_dummy_partition (rel=0x2f86e88, part_index=2)
> at partbounds.c:1959
> #2 0x0000000000891d38 in merge_list_bounds (partnatts=2,
> partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88,
> inner_rel=0x2fd4368, jointype=JOIN_LEFT,
> outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at
> partbounds.c:1325
> #3 0x0000000000891991 in partition_bounds_merge (partnatts=2,
> partsupfunc=0x2f70058, partcollation=0x2fd3c98, outer_rel=0x2f86e88,
> inner_rel=0x2fd4368, jointype=JOIN_LEFT,
> outer_parts=0x7ffea91f8cc0, inner_parts=0x7ffea91f8cb8) at
> partbounds.c:1198
> #4 0x000000000082cc5a in compute_partition_bounds (root=0x2f9e910,
> rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8,
> parts1=0x7ffea91f8cc0,
> parts2=0x7ffea91f8cb8) at joinrels.c:1644
> #5 0x000000000082c474 in try_partitionwise_join (root=0x2f9e910,
> rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, parent_sjinfo=0x2f7dfa8,
> parent_restrictlist=0x2fae650)
> at joinrels.c:1402
> #6 0x000000000082b6e2 in populate_joinrel_with_paths (root=0x2f9e910,
> rel1=0x2f86e88, rel2=0x2fd4368, joinrel=0x2fae388, sjinfo=0x2f7dfa8,
> restrictlist=0x2fae650) at joinrels.c:926
> #7 0x000000000082b135 in make_join_rel (root=0x2f9e910, rel1=0x2f86e88,
> rel2=0x2fd4368) at joinrels.c:760
> #8 0x000000000082a643 in make_rels_by_clause_joins (root=0x2f9e910,
> old_rel=0x2f86e88, other_rels_list=0x2f90148, other_rels=0x2f90160) at
> joinrels.c:312
> #9 0x000000000082a119 in join_search_one_level (root=0x2f9e910, level=3)
> at joinrels.c:123
> #10 0x000000000080cd97 in standard_join_search (root=0x2f9e910,
> levels_needed=3, initial_rels=0x2f90148) at allpaths.c:3020
> #11 0x000000000080cd10 in make_rel_from_joinlist (root=0x2f9e910,
> joinlist=0x2fd7550) at allpaths.c:2951
> #12 0x000000000080899a in make_one_rel (root=0x2f9e910,
> joinlist=0x2fd7550) at allpaths.c:228
> #13 0x000000000084516a in query_planner (root=0x2f9e910,
> qp_callback=0x84ad85 <standard_qp_callback>, qp_extra=0x7ffea91f9140) at
> planmain.c:276
> #14 0x000000000084788d in grouping_planner (root=0x2f9e910,
> tuple_fraction=0) at planner.c:1447
> #15 0x0000000000846f56 in subquery_planner (glob=0x2fa0c08,
> parse=0x2f56d30, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
> planner.c:1025
> #16 0x000000000084578b in standard_planner (parse=0x2f56d30,
> query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
> t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
> t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);",
> cursorOptions=2048, boundParams=0x0) at planner.c:406
> #17 0x0000000000845536 in planner (parse=0x2f56d30,
> query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
> t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
> t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);",
> cursorOptions=2048, boundParams=0x0) at planner.c:277
> #18 0x0000000000978faf in pg_plan_query (querytree=0x2f56d30,
> query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
> t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
> t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);",
> cursorOptions=2048, boundParams=0x0) at postgres.c:847
> #19 0x0000000000693e50 in ExplainOneQuery (query=0x2f56d30,
> cursorOptions=2048, into=0x0, es=0x2fa0920,
> queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
> t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
> t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);",
> params=0x0, queryEnv=0x0) at explain.c:397
> #20 0x00000000006939a5 in ExplainQuery (pstate=0x2f9e0a0, stmt=0x2f56b50,
> params=0x0, dest=0x2f9e008) at explain.c:281
> #21 0x0000000000981de8 in standard_ProcessUtility (pstmt=0x2fd2220,
> queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
> t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
> t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
> queryEnv=0x0, dest=0x2f9e008, qc=0x7ffea91f9aa0) at utility.c:862
> #22 0x0000000000981585 in ProcessUtility (pstmt=0x2fd2220,
> queryString=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
> t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
> t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);",
> readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
> queryEnv=0x0, dest=0x2f9e008, qc=0x7ffea91f9aa0) at utility.c:527
> #23 0x00000000009801ba in PortalRunUtility (portal=0x2f10180,
> pstmt=0x2fd2220, isTopLevel=true, setHoldSnapshot=true, dest=0x2f9e008,
> qc=0x7ffea91f9aa0) at pquery.c:1155
> #24 0x000000000097ff20 in FillPortalStore (portal=0x2f10180,
> isTopLevel=true) at pquery.c:1028
> #25 0x000000000097f883 in PortalRun (portal=0x2f10180,
> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2fd2310,
> altdest=0x2fd2310, qc=0x7ffea91f9c60) at pquery.c:760
> #26 0x00000000009795d1 in exec_simple_query (
> query_string=0x2eadcd0 "EXPLAIN (COSTS OFF)\nSELECT
> t1.c,t2.c,t3.c,t1.d,t2.d,t3.d FROM plt1 t1 INNER JOIN plt2 t2 ON (t1.c =
> t2.c AND t1.d = t2.d) LEFT JOIN plt1 t3 on (t2.c = t3.c AND t2.d = t3.d);")
> at postgres.c:1214
> #27 0x000000000097da8d in PostgresMain (dbname=0x2ed8068 "postgres",
> username=0x2ed8048 "edb") at postgres.c:4497
> #28 0x00000000008b9699 in BackendRun (port=0x2ecfd00) at postmaster.c:4560
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
>
> On Mon, Oct 11, 2021 at 11:05 AM Rajkumar Raghuwanshi <
> rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
>
>> Thanks for the patch, it applied cleanly and fixed the reported issue. I
>> observed another case where
>> In case of multi-col list partition on the same column query is not
>> picking partition wise join. Is this expected?
>>
>> CREATE TABLE plt1 (a int, b int, c varchar) PARTITION BY LIST(c,c);
>> CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
>> (('0001','0001'),('0002','0002'),('0003','0003'));
>> CREATE TABLE plt1_p2 PARTITION OF plt1 FOR VALUES IN
>> (('0004','0004'),('0005','0005'),('0006','0006'));
>> CREATE TABLE plt1_p3 PARTITION OF plt1 DEFAULT;
>> INSERT INTO plt1 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM
>> generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
>> ANALYSE plt1;
>> CREATE TABLE plt2 (a int, b int, c varchar) PARTITION BY LIST(c,c);
>> CREATE TABLE plt2_p1 PARTITION OF plt2 FOR VALUES IN
>> (('0001','0001'),('0002','0002'),('0003','0003'));
>> CREATE TABLE plt2_p2 PARTITION OF plt2 FOR VALUES IN
>> (('0004','0004'),('0005','0005'),('0006','0006'));
>> CREATE TABLE plt2_p3 PARTITION OF plt2 DEFAULT;
>> INSERT INTO plt2 SELECT i, i % 47, to_char(i % 11, 'FM0000') FROM
>> generate_series(0, 500) i WHERE i % 11 NOT IN (0,10);
>> ANALYSE plt2;
>> SET enable_partitionwise_join TO true;
>> EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1 INNER JOIN
>> plt2 t2 ON t1.c = t2.c;
>>
>> postgres=# EXPLAIN (COSTS OFF) SELECT t1.a, t1.c, t2.a, t2.c FROM plt1 t1
>> INNER JOIN plt2 t2 ON t1.c = t2.c;
>> QUERY PLAN
>> --------------------------------------------
>> Hash Join
>> Hash Cond: ((t1.c)::text = (t2.c)::text)
>> -> Append
>> -> Seq Scan on plt1_p1 t1_1
>> -> Seq Scan on plt1_p2 t1_2
>> -> Seq Scan on plt1_p3 t1_3
>> -> Hash
>> -> Append
>> -> Seq Scan on plt2_p1 t2_1
>> -> Seq Scan on plt2_p2 t2_2
>> -> Seq Scan on plt2_p3 t2_3
>> (11 rows)
>>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>>
>>
>> On Thu, Oct 7, 2021 at 6:03 PM Nitin Jadhav <
>> nitinjadhavpostgres(at)gmail(dot)com> wrote:
>>
>>> Thanks Rajkumar for testing.
>>>
>>> > I think it should throw an error as the partition by list has only 1
>>> column but we are giving 2 values.
>>>
>>> I also agree that it should throw an error in the above case. Fixed the
>>> issue in the attached patch. Also added related test cases to the
>>> regression test suite.
>>>
>>>
>>> > also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’
>>> instead of ('0001','0001').
>>>
>>> Now throwing errors in the initial stage, this case doesn't arise.
>>>
>>> Please share if you find any other issues.
>>>
>>> Thanks & Regards,
>>> Nitin Jadhav
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 7, 2021 at 4:05 PM Rajkumar Raghuwanshi <
>>> rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
>>>
>>>> Thanks Nitin,
>>>>
>>>> v4 patches applied cleanly and make check is passing now. While testing
>>>> further I observed that if multiple values are given for a single
>>>> column list partition it is not giving error instead it is changing
>>>> values itself. Please find the example below.
>>>>
>>>> postgres=# CREATE TABLE plt1 (a int, b varchar) PARTITION BY LIST(b);
>>>> CREATE TABLE
>>>> postgres=# CREATE TABLE plt1_p1 PARTITION OF plt1 FOR VALUES IN
>>>> (('0001','0001'),('0002','0002'));
>>>> CREATE TABLE
>>>> postgres=# \d+ plt1;
>>>> Partitioned table
>>>> "public.plt1"
>>>> Column | Type | Collation | Nullable | Default | Storage
>>>> | Compression | Stats target | Description
>>>>
>>>> --------+-------------------+-----------+----------+---------+----------+-------------+--------------+-------------
>>>> a | integer | | | | plain
>>>> | | |
>>>> b | character varying | | | | extended
>>>> | | |
>>>> Partition key: LIST (b)
>>>> Partitions: plt1_p1 FOR VALUES IN ('(0001,0001)', '(0002,0002)')
>>>>
>>>> I think it should throw an error as the partition by list has only 1
>>>> column but we are giving 2 values.
>>>> also if you see \d+ showing plt1_p1 partition value as ‘(0001,0001)’
>>>> instead of ('0001','0001').
>>>>
>>>> Thanks & Regards,
>>>> Rajkumar Raghuwanshi
>>>>
>>>>
>>>>
>>>> On Sun, Oct 3, 2021 at 1:52 AM Nitin Jadhav <
>>>> nitinjadhavpostgres(at)gmail(dot)com> wrote:
>>>>
>>>>> > > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is
>>>>> failing with below errors.
>>>>> >
>>>>> > Thanks Rajkumar for testing.
>>>>> >
>>>>> > Here's a v2 of the delta patch that should fix both of these test
>>>>> > failures. As I mentioned in my last reply, my delta patch fixed what
>>>>> > I think were problems in Nitin's v3 patch but were not complete by
>>>>> > themselves. Especially, I hadn't bothered to investigate various /*
>>>>> > TODO: handle multi-column list partitioning */ sites to deal with my
>>>>> > own changes.
>>>>>
>>>>> Thanks Rajkumar for testing and Thank you Amit for working on v2 of
>>>>> the delta patch. Actually I had done the code changes related to
>>>>> partition-wise join and I was in the middle of fixing the review
>>>>> comments, So I could not share the patch. Anyways thanks for your
>>>>> efforts.
>>>>>
>>>>> > I noticed that multi-column list partitions containing NULLs don't
>>>>> > work correctly with partition pruning yet.
>>>>> >
>>>>> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
>>>>> > create table p01 partition of p0 for values in ((1, 1, true), (NULL,
>>>>> 1, false));
>>>>> > create table p02 partition of p0 for values in ((1, NULL, false));
>>>>> > explain select * from p0 where a is null;
>>>>> > QUERY PLAN
>>>>> > --------------------------------------------------------
>>>>> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>>>>> > Filter: (a IS NULL)
>>>>> > (2 rows)
>>>>> >
>>>>> > In the attached updated version, I've dealt with some of those such
>>>>> > that at least the existing cases exercising partition pruning and
>>>>> > partition wise joins now pass.
>>>>>
>>>>> wrt partition pruning, I have checked the output of the above case
>>>>> with the v2 version of the delta patch and without that. The output
>>>>> remains same. Kindly let me know if I am missing something. But I feel
>>>>> the above output is correct as the partition p01 is the only partition
>>>>> which contains NULL value for column a, hence it is showing "Seq scan
>>>>> on p01" in the output. Kindly correct me if I am wrong. I feel the
>>>>> code changes related to 'null_keys' is not required, hence not
>>>>> incorporated that in the attached patch.
>>>>>
>>>>> wrt partition-wise join, I had run the regression test (with new cases
>>>>> related to partition-wise join) on v2 of the delta patch and observed
>>>>> the crash. Hence I have not incorporated the partition-wise join
>>>>> related code from v2 of delta patch to main v4 patch. Instead I have
>>>>> added the partition-wise join related code done by me in the attached
>>>>> patch. Please share your thoughts and if possible we can improvise the
>>>>> code. Rest of the changes looks good to me and I have incorporated
>>>>> that in the attached patch.
>>>>>
>>>>>
>>>>> > I guess that may be due to the following newly added code being
>>>>> incomplete:
>>>>> > Maybe this function needs to return a "bitmapset" of indexes, because
>>>>> > multiple partitions can now contain NULL values.
>>>>>
>>>>> I feel this function is not required at all as we are not separating
>>>>> the non null and null partitions now. Removed in the attached patch.
>>>>> Also removed the "scan_null' variable from the structure
>>>>> "PruneStepResult" and cleaned up the corresponding code blocks.
>>>>>
>>>>>
>>>>> > This function name may be too generic. Given that it is specific to
>>>>> > implementing list bound de-duplication, maybe the following signature
>>>>> > is more appropriate:
>>>>> >
>>>>> > static bool
>>>>> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>>>>>
>>>>> Yes. The function name looks more generic. How about using
>>>>> "isListBoundDuplicated()"? I have used this name in the patch. Please
>>>>> let me know if that does not look correct.
>>>>>
>>>>>
>>>>> > Also, better if the function comment mentions those parameter names,
>>>>> like:
>>>>> >
>>>>> > "Returns TRUE if the list bound element 'new_bound' is already
>>>>> present
>>>>> > in the target list 'list_bounds', FALSE otherwise."
>>>>>
>>>>> Fixed.
>>>>>
>>>>>
>>>>> > +/*
>>>>> > + * transformPartitionListBounds
>>>>> > + *
>>>>> > + * Converts the expressions of list partition bounds from the raw
>>>>> grammar
>>>>> > + * representation.
>>>>> >
>>>>> > A sentence about the result format would be helpful, like:
>>>>> >
>>>>> > The result is a List of Lists of Const nodes to account for the
>>>>> > partition key possibly containing more than one column.
>>>>>
>>>>> Fixed.
>>>>>
>>>>>
>>>>> > + int i = 0;
>>>>> > + int j = 0;
>>>>> >
>>>>> > Better to initialize such loop counters closer to the loop.
>>>>>
>>>>> Fixed in all the places.
>>>>>
>>>>>
>>>>> > + colname[i] = (char *) palloc0(NAMEDATALEN *
>>>>> sizeof(char));
>>>>> > + colname[i] = get_attname(RelationGetRelid(parent),
>>>>> > + key->partattrs[i], false);
>>>>> >
>>>>> > The palloc in the 1st statement is wasteful, because the 2nd
>>>>> statement
>>>>> > overwrites its pointer by the pointer to the string palloc'd by
>>>>> > get_attname().
>>>>>
>>>>> Removed the 1st statement as it is not required.
>>>>>
>>>>>
>>>>> > + ListCell *cell2 = NULL;
>>>>> >
>>>>> > No need to explicitly initialize the loop variable.
>>>>>
>>>>> Fixed in all the places.
>>>>>
>>>>>
>>>>> > + RowExpr *rowexpr = NULL;
>>>>> > +
>>>>> > + if (!IsA(expr, RowExpr))
>>>>> > + ereport(ERROR,
>>>>> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>>>> > + errmsg("Invalid list bound specification"),
>>>>> > + parser_errposition(pstate, exprLocation((Node
>>>>> > *) spec))));
>>>>> > +
>>>>> > + rowexpr = (RowExpr *) expr;
>>>>> >
>>>>> > It's okay to assign rowexpr at the top here instead of the dummy
>>>>> > NULL-initialization and write the condition as:
>>>>> >
>>>>> > if (!IsA(rowexpr, RowExpr))
>>>>>
>>>>> Fixed.
>>>>>
>>>>>
>>>>> > + if (isDuplicate)
>>>>> > + continue;
>>>>> > +
>>>>> > + result = lappend(result, values);
>>>>> >
>>>>> > I can see you copied this style from the existing code, but how about
>>>>> > writing this simply as:
>>>>> >
>>>>> > if (!isDuplicate)
>>>>> > result = lappend(result, values);
>>>>>
>>>>> This looks good. I have changed in the patch.
>>>>>
>>>>>
>>>>> > -/* One value coming from some (index'th) list partition */
>>>>> > +/* One bound of a list partition */
>>>>> > typedef struct PartitionListValue
>>>>> > {
>>>>> > int index;
>>>>> > - Datum value;
>>>>> > + Datum *values;
>>>>> > + bool *isnulls;
>>>>> > } PartitionListValue;
>>>>> >
>>>>> > Given that this is a locally-defined struct, I wonder if it makes
>>>>> > sense to rename the struct while we're at it. Call it, say,
>>>>> > PartitionListBound?
>>>>>
>>>>> Yes. PartitionListBound looks more appropriate and it also matches the
>>>>> similar structures of the other partition strategies.
>>>>>
>>>>> > Also, please keep part of the existing comment that says that the
>>>>> > bound belongs to index'th partition.
>>>>>
>>>>> Retained the old comment.
>>>>>
>>>>>
>>>>> > + * partition_bound_accepts_nulls
>>>>> > + *
>>>>> > + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>>>>> > */
>>>>> >
>>>>> > I suggest slight rewording, as follows:
>>>>> >
>>>>> > "Returns TRUE if any of the partition bounds contains a NULL value,
>>>>> > FALSE otherwise."
>>>>>
>>>>> Fixed.
>>>>>
>>>>>
>>>>> > - PartitionListValue *all_values;
>>>>> > + PartitionListValue **all_values;
>>>>> > ...
>>>>> > - all_values = (PartitionListValue *)
>>>>> > - palloc(ndatums * sizeof(PartitionListValue));
>>>>> > + ndatums = get_list_datum_count(boundspecs, nparts);
>>>>> > + all_values = (PartitionListValue **)
>>>>> > + palloc(ndatums * sizeof(PartitionListValue *));
>>>>> >
>>>>> > I don't see the need to redefine all_values's pointer type. No need
>>>>> > to palloc PartitionListValue repeatedly for every datum as done
>>>>> > further down as follows:
>>>>> >
>>>>> > + all_values[j] = (PartitionListValue *)
>>>>> > palloc(sizeof(PartitionListValue));
>>>>> >
>>>>> > You do need the following two though:
>>>>> >
>>>>> > + all_values[j]->values = (Datum *) palloc0(key->partnatts
>>>>> *
>>>>> > sizeof(Datum));
>>>>> > + all_values[j]->isnulls = (bool *) palloc0(key->partnatts
>>>>> *
>>>>> > sizeof(bool));
>>>>> >
>>>>> > If you change the above the way I suggest, you'd also need to revert
>>>>> > the following change:
>>>>> >
>>>>> > - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>>>>> > + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>>>>> > qsort_partition_list_value_cmp, (void *) key);
>>>>> >
>>>>> > + int orig_index = all_values[i]->index;
>>>>> > + boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
>>>>> sizeof(Datum));
>>>>> >
>>>>> > Missing a newline between these two statements.
>>>>>
>>>>> Fixed. Made necessary changes to keep the intent of existing code.
>>>>>
>>>>>
>>>>> > @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>>>>> > *parttyplen, bool *parttypbyval,
>>>>> > if (b1->nindexes != b2->nindexes)
>>>>> > return false;
>>>>> >
>>>>> > - if (b1->null_index != b2->null_index)
>>>>> > + if (get_partition_bound_null_index(b1) !=
>>>>> > get_partition_bound_null_index(b2))
>>>>> >
>>>>> > As mentioned in the last message, this bit in
>>>>> partition_bounds_equal()
>>>>> > needs to be comparing "bitmapsets" of null bound indexes, that is
>>>>> > after fixing get_partition_bound_null_index() as previously
>>>>> mentioned.
>>>>>
>>>>> As mentioned earlier, removed the functionality of
>>>>> get_partition_bound_null_index(), hence the above condition is not
>>>>> required and removed.
>>>>>
>>>>> > But...
>>>>> >
>>>>> > @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>>>>> > *parttyplen, bool *parttypbyval,
>>>>> > * context. datumIsEqual() should be simple enough
>>>>> to be
>>>>> > * safe.
>>>>> > */
>>>>> > - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
>>>>> > + if (b1->isnulls)
>>>>> > + b1_isnull = b1->isnulls[i][j];
>>>>> > + if (b2->isnulls)
>>>>> > + b2_isnull = b2->isnulls[i][j];
>>>>> > +
>>>>> > + /*
>>>>> > + * If any of the partition bound has NULL value,
>>>>> then check
>>>>> > + * equality for the NULL value instead of comparing
>>>>> the datums
>>>>> > + * as it does not contain valid value in case of
>>>>> NULL.
>>>>> > + */
>>>>> > + if (b1_isnull || b2_isnull)
>>>>> > + {
>>>>> > + if (b1_isnull != b2_isnull)
>>>>> > + return false;
>>>>> > + }
>>>>> >
>>>>> > ...if you have this in the main loop, I don't think we need the above
>>>>> > code stanza which appears to implement a short-cut for this long-form
>>>>> > logic.
>>>>>
>>>>> Yes. May be we could have ignored the above code stanza if we would
>>>>> have comparing the null indexes using get_partition_bound_null_index()
>>>>> in the beginning of the function. But hence we are not separating the
>>>>> non null partitions and null partitions, I would like to keep the
>>>>> logic in the inner loop as we are doing it for non null bound values
>>>>> in the above code stanza, just to give a feel that null bound values
>>>>> are also handled the same way as non null values. Please correct me if
>>>>> I am wrong.
>>>>>
>>>>>
>>>>> > + (key->strategy != PARTITION_STRATEGY_LIST ||
>>>>> > + !src->isnulls[i][j]))
>>>>> >
>>>>> > I think it's better to write this condition as follows just like the
>>>>> > accompanying condition involving src->kind:
>>>>> >
>>>>> > (src->nulls == NULL || !src->isnulls[i][j])
>>>>>
>>>>> Fixed.
>>>>>
>>>>>
>>>>> > In check_new_partition_bound():
>>>>> >
>>>>> > + Datum *values = (Datum *)
>>>>> > palloc0(key->partnatts * sizeof(Datum));
>>>>> > + bool *isnulls = (bool *)
>>>>> > palloc0(key->partnatts * sizeof(bool));
>>>>> >
>>>>> > Doesn't seem like a bad idea to declare these as:
>>>>> >
>>>>> > Datum values[PARTITION_MAX_KEYS];
>>>>> > bool isnulls[PARTITION_MAX_KEYS];
>>>>>
>>>>> Thanks for the suggestion. I have changed as above.
>>>>>
>>>>> > I looked at get_qual_for_list_multi_column() and immediately thought
>>>>> > that it may be a bad idea. I think it's better to integrate the
>>>>> logic
>>>>> > for multi-column case into the existing function even if that makes
>>>>> > the function appear more complex. Having two functions with the same
>>>>> > goal and mostly the same code is not a good idea mainly because it
>>>>> > becomes a maintenance burden.
>>>>>
>>>>> Actually I had written a separate function because of the complexity.
>>>>> Now I have understood that since the objective is same, it should be
>>>>> done in a single function irrespective of complexity.
>>>>>
>>>>> > I have attempted a rewrite such that get_qual_for_list() now handles
>>>>> > both the single-column and multi-column cases. Changes included in
>>>>> > the delta patch. The patch updates some outputs of the newly added
>>>>> > tests for multi-column list partitions, because the new code emits
>>>>> the
>>>>> > IS NOT NULL tests a bit differently than
>>>>> > get_qual_for_list_mutli_column() would. Notably, the old approach
>>>>> > would emit IS NOT NULL for every non-NULL datum matched to a given
>>>>> > column, not just once for the column. However, the patch makes a few
>>>>> > other tests fail, mainly because I had to fix
>>>>> > partition_bound_accepts_nulls() to handle the multi-column case,
>>>>> > though didn't bother to update all callers of it to also handle the
>>>>> > multi-column case correctly. I guess that's a TODO you're going to
>>>>> > deal with at some point anyway. :)
>>>>>
>>>>> Thank you very much for your efforts. The changes looks good to me and
>>>>> I have incorporated these changes in the attached patch.
>>>>>
>>>>> I have completed the coding for all the TODOs and hence removed in the
>>>>> patch. The naming conventions used for function/variable names varies
>>>>> across the files. Some places it is like 'namesLikeThis' and in some
>>>>> place it is like 'names_like_this'. I have used the naming conventions
>>>>> based on the surrounding styles used. I am happy to change those if
>>>>> required.
>>>>>
>>>>> I have verified 'make check' with the attached patch and it is working
>>>>> fine.
>>>>>
>>>>>
>>>>> Thanks & Regards,
>>>>> Nitin Jadhav
>>>>>
>>>>>
>>>>> On Mon, Sep 13, 2021 at 3:47 PM Rajkumar Raghuwanshi
>>>>> <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
>>>>> >
>>>>> > On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is
>>>>> failing with below errors.
>>>>> >
>>>>> > --inherit.sql is failing with error :"ERROR: negative bitmapset
>>>>> member not allowed"
>>>>> > update mlparted_tab mlp set c = 'xxx'
>>>>> > from
>>>>> > (select a from some_tab union all select a+1 from some_tab) ss (a)
>>>>> > where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
>>>>> > ERROR: negative bitmapset member not allowed
>>>>> >
>>>>> > --partition_join.sql is crashing with enable_partitionwise_join set
>>>>> to true.
>>>>> > CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
>>>>> > CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN
>>>>> ('0001', '0003');
>>>>> > CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN
>>>>> ('0004', '0006');
>>>>> > CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN
>>>>> ('0008', '0009');
>>>>> > INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
>>>>> generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
>>>>> > ANALYZE plt1_adv;
>>>>> > CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
>>>>> > CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN
>>>>> ('0002', '0003');
>>>>> > CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN
>>>>> ('0004', '0006');
>>>>> > CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN
>>>>> ('0007', '0009');
>>>>> > INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
>>>>> generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
>>>>> > ANALYZE plt2_adv;
>>>>> > -- inner join
>>>>> > EXPLAIN (COSTS OFF)
>>>>> > SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv
>>>>> t2 ON (t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
>>>>> > server closed the connection unexpectedly
>>>>> > This probably means the server terminated abnormally
>>>>> > before or while processing the request.
>>>>> > connection to server was lost
>>>>> >
>>>>> >
>>>>> > --stack-trace
>>>>> > Core was generated by `postgres: edb regression [local] EXPLAIN
>>>>> '.
>>>>> > Program terminated with signal 6, Aborted.
>>>>> > #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
>>>>> > Missing separate debuginfos, use: debuginfo-install
>>>>> glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
>>>>> krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
>>>>> libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64
>>>>> openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64
>>>>> zlib-1.2.7-17.el7.x86_64
>>>>> > (gdb) bt
>>>>> > #0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
>>>>> > #1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
>>>>> > #2 0x0000000000b0fbc3 in ExceptionalCondition
>>>>> (conditionName=0xcbda10 "part_index >= 0", errorType=0xcbd1c3
>>>>> "FailedAssertion", fileName=0xcbd2fe "partbounds.c", lineNumber=1957)
>>>>> > at assert.c:69
>>>>> > #3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0,
>>>>> part_index=-1) at partbounds.c:1957
>>>>> > #4 0x00000000008919bd in merge_list_bounds (partnatts=1,
>>>>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
>>>>> inner_rel=0x1922938, jointype=JOIN_INNER,
>>>>> > outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
>>>>> partbounds.c:1529
>>>>> > #5 0x00000000008910de in partition_bounds_merge (partnatts=1,
>>>>> partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
>>>>> inner_rel=0x1922938, jointype=JOIN_INNER,
>>>>> > outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
>>>>> partbounds.c:1223
>>>>> > #6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0,
>>>>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
>>>>> parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
>>>>> > parts2=0x7fffd67751a8) at joinrels.c:1644
>>>>> > #7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0,
>>>>> rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
>>>>> parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
>>>>> > at joinrels.c:1402
>>>>> > #8 0x000000000082aea2 in populate_joinrel_with_paths
>>>>> (root=0x1a19ed0, rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
>>>>> sjinfo=0x7fffd67752a0, restrictlist=0x1ab3318)
>>>>> > at joinrels.c:926
>>>>> > #9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0,
>>>>> rel1=0x19b37c0, rel2=0x1922938) at joinrels.c:760
>>>>> > #10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0,
>>>>> old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at
>>>>> joinrels.c:312
>>>>> > #11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0,
>>>>> level=2) at joinrels.c:123
>>>>> > #12 0x000000000080c566 in standard_join_search (root=0x1a19ed0,
>>>>> levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
>>>>> > #13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0,
>>>>> joinlist=0x199d538) at allpaths.c:2951
>>>>> > #14 0x000000000080816b in make_one_rel (root=0x1a19ed0,
>>>>> joinlist=0x199d538) at allpaths.c:228
>>>>> > #15 0x000000000084491d in query_planner (root=0x1a19ed0,
>>>>> qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at
>>>>> planmain.c:276
>>>>> > #16 0x0000000000847040 in grouping_planner (root=0x1a19ed0,
>>>>> tuple_fraction=0) at planner.c:1447
>>>>> > #17 0x0000000000846709 in subquery_planner (glob=0x19b39d8,
>>>>> parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
>>>>> planner.c:1025
>>>>> > #18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
>>>>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>>>>> boundParams=0x0) at planner.c:406
>>>>> > #19 0x0000000000844ce9 in planner (parse=0x1aaa290,
>>>>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>>>>> boundParams=0x0) at planner.c:277
>>>>> > #20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
>>>>> > query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048,
>>>>> boundParams=0x0) at postgres.c:847
>>>>> > #21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290,
>>>>> cursorOptions=2048, into=0x0, es=0x19b36f0,
>>>>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>>>>> > params=0x0, queryEnv=0x0) at explain.c:397
>>>>> > #22 0x0000000000693351 in ExplainQuery (pstate=0x197c410,
>>>>> stmt=0x1aaa0b0, params=0x0, dest=0x197c378) at explain.c:281
>>>>> > #23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
>>>>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>>>>> > readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
>>>>> params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at
>>>>> utility.c:845
>>>>> > #24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
>>>>> > queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c,
>>>>> t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c
>>>>> = t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
>>>>> > readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL,
>>>>> params=0x0, queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at
>>>>> utility.c:527
>>>>> > #25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40,
>>>>> pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378,
>>>>> qc=0x7fffd6775f90) at pquery.c:1147
>>>>> > #26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40,
>>>>> isTopLevel=true) at pquery.c:1026
>>>>> > #27 0x000000000097ed11 in PortalRun (portal=0x1893b40,
>>>>> count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8,
>>>>> altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
>>>>> > #28 0x0000000000978aa5 in exec_simple_query (
>>>>> >
>>>>> > Thanks & Regards,
>>>>> > Rajkumar Raghuwanshi
>>>>> >
>>>>> >
>>>>> > On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09(at)gmail(dot)com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <
>>>>> amitlangote09(at)gmail(dot)com> wrote:
>>>>> >> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
>>>>> >> > <nitinjadhavpostgres(at)gmail(dot)com> wrote:
>>>>> >> > > The attached patch also fixes the above comments.
>>>>> >> >
>>>>> >> > I noticed that multi-column list partitions containing NULLs don't
>>>>> >> > work correctly with partition pruning yet.
>>>>> >> >
>>>>> >> > create table p0 (a int, b text, c bool) partition by list (a, b,
>>>>> c);
>>>>> >> > create table p01 partition of p0 for values in ((1, 1, true),
>>>>> (NULL, 1, false));
>>>>> >> > create table p02 partition of p0 for values in ((1, NULL, false));
>>>>> >> > explain select * from p0 where a is null;
>>>>> >> > QUERY PLAN
>>>>> >> > --------------------------------------------------------
>>>>> >> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
>>>>> >> > Filter: (a IS NULL)
>>>>> >> > (2 rows)
>>>>> >> >
>>>>> >> > I guess that may be due to the following newly added code being
>>>>> incomplete:
>>>>> >> >
>>>>> >> > +/*
>>>>> >> > + * get_partition_bound_null_index
>>>>> >> > + *
>>>>> >> > + * Returns the partition index of the partition bound which
>>>>> accepts NULL.
>>>>> >> > + */
>>>>> >> > +int
>>>>> >> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
>>>>> >> > +{
>>>>> >> > + int i = 0;
>>>>> >> > + int j = 0;
>>>>> >> > +
>>>>> >> > + if (!boundinfo->isnulls)
>>>>> >> > + return -1;
>>>>> >> >
>>>>> >> > - if (!val->constisnull)
>>>>> >> > - count++;
>>>>> >> > + for (i = 0; i < boundinfo->ndatums; i++)
>>>>> >> > + {
>>>>> >> > + //TODO: Handle for multi-column cases
>>>>> >> > + for (j = 0; j < 1; j++)
>>>>> >> > + {
>>>>> >> > + if (boundinfo->isnulls[i][j])
>>>>> >> > + return boundinfo->indexes[i];
>>>>> >> > }
>>>>> >> > }
>>>>> >> >
>>>>> >> > + return -1;
>>>>> >> > +}
>>>>> >> >
>>>>> >> > Maybe this function needs to return a "bitmapset" of indexes,
>>>>> because
>>>>> >> > multiple partitions can now contain NULL values.
>>>>> >> >
>>>>> >> > Some other issues I noticed and suggestions for improvement:
>>>>> >> >
>>>>> >> > +/*
>>>>> >> > + * checkForDuplicates
>>>>> >> > + *
>>>>> >> > + * Returns TRUE if the list bound element is already present in
>>>>> the list of
>>>>> >> > + * list bounds, FALSE otherwise.
>>>>> >> > + */
>>>>> >> > +static bool
>>>>> >> > +checkForDuplicates(List *source, List *searchElem)
>>>>> >> >
>>>>> >> > This function name may be too generic. Given that it is specific
>>>>> to
>>>>> >> > implementing list bound de-duplication, maybe the following
>>>>> signature
>>>>> >> > is more appropriate:
>>>>> >> >
>>>>> >> > static bool
>>>>> >> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
>>>>> >> >
>>>>> >> > Also, better if the function comment mentions those parameter
>>>>> names, like:
>>>>> >> >
>>>>> >> > "Returns TRUE if the list bound element 'new_bound' is already
>>>>> present
>>>>> >> > in the target list 'list_bounds', FALSE otherwise."
>>>>> >> >
>>>>> >> > +/*
>>>>> >> > + * transformPartitionListBounds
>>>>> >> > + *
>>>>> >> > + * Converts the expressions of list partition bounds from the
>>>>> raw grammar
>>>>> >> > + * representation.
>>>>> >> >
>>>>> >> > A sentence about the result format would be helpful, like:
>>>>> >> >
>>>>> >> > The result is a List of Lists of Const nodes to account for the
>>>>> >> > partition key possibly containing more than one column.
>>>>> >> >
>>>>> >> > + int i = 0;
>>>>> >> > + int j = 0;
>>>>> >> >
>>>>> >> > Better to initialize such loop counters closer to the loop.
>>>>> >> >
>>>>> >> > + colname[i] = (char *) palloc0(NAMEDATALEN *
>>>>> sizeof(char));
>>>>> >> > + colname[i] = get_attname(RelationGetRelid(parent),
>>>>> >> > + key->partattrs[i], false);
>>>>> >> >
>>>>> >> > The palloc in the 1st statement is wasteful, because the 2nd
>>>>> statement
>>>>> >> > overwrites its pointer by the pointer to the string palloc'd by
>>>>> >> > get_attname().
>>>>> >> >
>>>>> >> > + ListCell *cell2 = NULL;
>>>>> >> >
>>>>> >> > No need to explicitly initialize the loop variable.
>>>>> >> >
>>>>> >> > + RowExpr *rowexpr = NULL;
>>>>> >> > +
>>>>> >> > + if (!IsA(expr, RowExpr))
>>>>> >> > + ereport(ERROR,
>>>>> >> > +
>>>>> (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>>>>> >> > + errmsg("Invalid list bound
>>>>> specification"),
>>>>> >> > + parser_errposition(pstate,
>>>>> exprLocation((Node
>>>>> >> > *) spec))));
>>>>> >> > +
>>>>> >> > + rowexpr = (RowExpr *) expr;
>>>>> >> >
>>>>> >> > It's okay to assign rowexpr at the top here instead of the dummy
>>>>> >> > NULL-initialization and write the condition as:
>>>>> >> >
>>>>> >> > if (!IsA(rowexpr, RowExpr))
>>>>> >> >
>>>>> >> > + if (isDuplicate)
>>>>> >> > + continue;
>>>>> >> > +
>>>>> >> > + result = lappend(result, values);
>>>>> >> >
>>>>> >> > I can see you copied this style from the existing code, but how
>>>>> about
>>>>> >> > writing this simply as:
>>>>> >> >
>>>>> >> > if (!isDuplicate)
>>>>> >> > result = lappend(result, values);
>>>>> >> >
>>>>> >> > -/* One value coming from some (index'th) list partition */
>>>>> >> > +/* One bound of a list partition */
>>>>> >> > typedef struct PartitionListValue
>>>>> >> > {
>>>>> >> > int index;
>>>>> >> > - Datum value;
>>>>> >> > + Datum *values;
>>>>> >> > + bool *isnulls;
>>>>> >> > } PartitionListValue;
>>>>> >> >
>>>>> >> > Given that this is a locally-defined struct, I wonder if it makes
>>>>> >> > sense to rename the struct while we're at it. Call it, say,
>>>>> >> > PartitionListBound?
>>>>> >> >
>>>>> >> > Also, please keep part of the existing comment that says that the
>>>>> >> > bound belongs to index'th partition.
>>>>> >> >
>>>>> >> > Will send more comments in a bit...
>>>>> >>
>>>>> >> + * partition_bound_accepts_nulls
>>>>> >> + *
>>>>> >> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
>>>>> >> */
>>>>> >>
>>>>> >> I suggest slight rewording, as follows:
>>>>> >>
>>>>> >> "Returns TRUE if any of the partition bounds contains a NULL value,
>>>>> >> FALSE otherwise."
>>>>> >>
>>>>> >> - PartitionListValue *all_values;
>>>>> >> + PartitionListValue **all_values;
>>>>> >> ...
>>>>> >> - all_values = (PartitionListValue *)
>>>>> >> - palloc(ndatums * sizeof(PartitionListValue));
>>>>> >> + ndatums = get_list_datum_count(boundspecs, nparts);
>>>>> >> + all_values = (PartitionListValue **)
>>>>> >> + palloc(ndatums * sizeof(PartitionListValue *));
>>>>> >>
>>>>> >> I don't see the need to redefine all_values's pointer type. No need
>>>>> >> to palloc PartitionListValue repeatedly for every datum as done
>>>>> >> further down as follows:
>>>>> >>
>>>>> >> + all_values[j] = (PartitionListValue *)
>>>>> >> palloc(sizeof(PartitionListValue));
>>>>> >>
>>>>> >> You do need the following two though:
>>>>> >>
>>>>> >> + all_values[j]->values = (Datum *)
>>>>> palloc0(key->partnatts *
>>>>> >> sizeof(Datum));
>>>>> >> + all_values[j]->isnulls = (bool *)
>>>>> palloc0(key->partnatts *
>>>>> >> sizeof(bool));
>>>>> >>
>>>>> >> If you change the above the way I suggest, you'd also need to revert
>>>>> >> the following change:
>>>>> >>
>>>>> >> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
>>>>> >> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
>>>>> >> qsort_partition_list_value_cmp, (void *) key);
>>>>> >>
>>>>> >> + int orig_index = all_values[i]->index;
>>>>> >> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
>>>>> sizeof(Datum));
>>>>> >>
>>>>> >> Missing a newline between these two statements.
>>>>> >>
>>>>> >> BTW, I noticed that the boundDatums variable is no longer used in
>>>>> >> create_list_bounds. I traced back its origin and found that a
>>>>> recent
>>>>> >> commit 53d86957e98 introduced it to implement an idea to reduce the
>>>>> >> finer-grained pallocs that were being done in
>>>>> create_list_bounds(). I
>>>>> >> don't think that this patch needs to throw away that work. You can
>>>>> >> make it work as the attached delta patch that applies on top of v3.
>>>>> >> Please check.
>>>>> >>
>>>>> >> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
>>>>> >> *parttyplen, bool *parttypbyval,
>>>>> >> if (b1->nindexes != b2->nindexes)
>>>>> >> return false;
>>>>> >>
>>>>> >> - if (b1->null_index != b2->null_index)
>>>>> >> + if (get_partition_bound_null_index(b1) !=
>>>>> >> get_partition_bound_null_index(b2))
>>>>> >>
>>>>> >> As mentioned in the last message, this bit in
>>>>> partition_bounds_equal()
>>>>> >> needs to be comparing "bitmapsets" of null bound indexes, that is
>>>>> >> after fixing get_partition_bound_null_index() as previously
>>>>> mentioned.
>>>>> >>
>>>>> >> But...
>>>>> >>
>>>>> >> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
>>>>> >> *parttyplen, bool *parttypbyval,
>>>>> >> * context. datumIsEqual() should be simple enough
>>>>> to be
>>>>> >> * safe.
>>>>> >> */
>>>>> >> - if (!datumIsEqual(b1->datums[i][j],
>>>>> b2->datums[i][j],
>>>>> >> + if (b1->isnulls)
>>>>> >> + b1_isnull = b1->isnulls[i][j];
>>>>> >> + if (b2->isnulls)
>>>>> >> + b2_isnull = b2->isnulls[i][j];
>>>>> >> +
>>>>> >> + /*
>>>>> >> + * If any of the partition bound has NULL value,
>>>>> then check
>>>>> >> + * equality for the NULL value instead of comparing
>>>>> the datums
>>>>> >> + * as it does not contain valid value in case of
>>>>> NULL.
>>>>> >> + */
>>>>> >> + if (b1_isnull || b2_isnull)
>>>>> >> + {
>>>>> >> + if (b1_isnull != b2_isnull)
>>>>> >> + return false;
>>>>> >> + }
>>>>> >>
>>>>> >> ...if you have this in the main loop, I don't think we need the
>>>>> above
>>>>> >> code stanza which appears to implement a short-cut for this
>>>>> long-form
>>>>> >> logic.
>>>>> >>
>>>>> >> + (key->strategy != PARTITION_STRATEGY_LIST ||
>>>>> >> + !src->isnulls[i][j]))
>>>>> >>
>>>>> >> I think it's better to write this condition as follows just like the
>>>>> >> accompanying condition involving src->kind:
>>>>> >>
>>>>> >> (src->nulls == NULL || !src->isnulls[i][j])
>>>>> >>
>>>>> >> (Skipped looking at merge_list_bounds() and related changes for now
>>>>> as
>>>>> >> I see a lot of TODOs remain to be done.)
>>>>> >>
>>>>> >> In check_new_partition_bound():
>>>>> >>
>>>>> >> + Datum *values = (Datum *)
>>>>> >> palloc0(key->partnatts * sizeof(Datum));
>>>>> >> + bool *isnulls = (bool *)
>>>>> >> palloc0(key->partnatts * sizeof(bool));
>>>>> >>
>>>>> >> Doesn't seem like a bad idea to declare these as:
>>>>> >>
>>>>> >> Datum values[PARTITION_MAX_KEYS];
>>>>> >> bool isnulls[PARTITION_MAX_KEYS];
>>>>> >>
>>>>> >>
>>>>> >> I looked at get_qual_for_list_multi_column() and immediately thought
>>>>> >> that it may be a bad idea. I think it's better to integrate the
>>>>> logic
>>>>> >> for multi-column case into the existing function even if that makes
>>>>> >> the function appear more complex. Having two functions with the
>>>>> same
>>>>> >> goal and mostly the same code is not a good idea mainly because it
>>>>> >> becomes a maintenance burden.
>>>>> >>
>>>>> >> I have attempted a rewrite such that get_qual_for_list() now handles
>>>>> >> both the single-column and multi-column cases. Changes included in
>>>>> >> the delta patch. The patch updates some outputs of the newly added
>>>>> >> tests for multi-column list partitions, because the new code emits
>>>>> the
>>>>> >> IS NOT NULL tests a bit differently than
>>>>> >> get_qual_for_list_mutli_column() would. Notably, the old approach
>>>>> >> would emit IS NOT NULL for every non-NULL datum matched to a given
>>>>> >> column, not just once for the column. However, the patch makes a
>>>>> few
>>>>> >> other tests fail, mainly because I had to fix
>>>>> >> partition_bound_accepts_nulls() to handle the multi-column case,
>>>>> >> though didn't bother to update all callers of it to also handle the
>>>>> >> multi-column case correctly. I guess that's a TODO you're going to
>>>>> >> deal with at some point anyway. :)
>>>>> >>
>>>>> >> I still have more than half of v3 left to look at, so will continue
>>>>> >> looking. In the meantime, please check the changes I suggested,
>>>>> >> including the delta patch, and let me know your thoughts.
>>>>> >>
>>>>> >> --
>>>>> >> Amit Langote
>>>>> >> EDB: http://www.enterprisedb.com
>>>>>
>>>>

Attachment	Content-Type	Size
v6-0001-multi-column-list-partitioning.patch	application/octet-stream	195.6 KB

In response to

Re: Multi-Column List Partitioning at 2021-10-11 10:42:32 from Rajkumar Raghuwanshi

Responses

Re: Multi-Column List Partitioning at 2021-10-22 10:50:33 from Zhihong Yu
Re: Multi-Column List Partitioning at 2021-10-29 08:10:04 from Amit Langote

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bharath Rupireddy	2021-10-22 10:04:36	Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"
Previous Message	Amit Kapila	2021-10-22 09:47:56	Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir