Re: Multi-Column List Partitioning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-09-01 05:31:07
Message-ID: CA+HiwqE69D6C4kUZ_nmmYXcGvU7EDtrotLD716aed83-KVgwew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Nitin,

On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
> The attached patch also fixes the above comments.

I noticed that multi-column list partitions containing NULLs don't
work correctly with partition pruning yet.

create table p0 (a int, b text, c bool) partition by list (a, b, c);
create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1, false));
create table p02 partition of p0 for values in ((1, NULL, false));
explain select * from p0 where a is null;
QUERY PLAN
--------------------------------------------------------
Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
Filter: (a IS NULL)
(2 rows)

I guess that may be due to the following newly added code being incomplete:

+/*
+ * get_partition_bound_null_index
+ *
+ * Returns the partition index of the partition bound which accepts NULL.
+ */
+int
+get_partition_bound_null_index(PartitionBoundInfo boundinfo)
+{
+ int i = 0;
+ int j = 0;
+
+ if (!boundinfo->isnulls)
+ return -1;

- if (!val->constisnull)
- count++;
+ for (i = 0; i < boundinfo->ndatums; i++)
+ {
+ //TODO: Handle for multi-column cases
+ for (j = 0; j < 1; j++)
+ {
+ if (boundinfo->isnulls[i][j])
+ return boundinfo->indexes[i];
}
}

+ return -1;
+}

Maybe this function needs to return a "bitmapset" of indexes, because
multiple partitions can now contain NULL values.

Some other issues I noticed and suggestions for improvement:

+/*
+ * checkForDuplicates
+ *
+ * Returns TRUE if the list bound element is already present in the list of
+ * list bounds, FALSE otherwise.
+ */
+static bool
+checkForDuplicates(List *source, List *searchElem)

This function name may be too generic. Given that it is specific to
implementing list bound de-duplication, maybe the following signature
is more appropriate:

static bool
checkListBoundDuplicated(List *list_bounds, List *new_bound)

Also, better if the function comment mentions those parameter names, like:

"Returns TRUE if the list bound element 'new_bound' is already present
in the target list 'list_bounds', FALSE otherwise."

+/*
+ * transformPartitionListBounds
+ *
+ * Converts the expressions of list partition bounds from the raw grammar
+ * representation.

A sentence about the result format would be helpful, like:

The result is a List of Lists of Const nodes to account for the
partition key possibly containing more than one column.

+ int i = 0;
+ int j = 0;

Better to initialize such loop counters closer to the loop.

+ colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
+ colname[i] = get_attname(RelationGetRelid(parent),
+ key->partattrs[i], false);

The palloc in the 1st statement is wasteful, because the 2nd statement
overwrites its pointer by the pointer to the string palloc'd by
get_attname().

+ ListCell *cell2 = NULL;

No need to explicitly initialize the loop variable.

+ RowExpr *rowexpr = NULL;
+
+ if (!IsA(expr, RowExpr))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("Invalid list bound specification"),
+ parser_errposition(pstate, exprLocation((Node
*) spec))));
+
+ rowexpr = (RowExpr *) expr;

It's okay to assign rowexpr at the top here instead of the dummy
NULL-initialization and write the condition as:

if (!IsA(rowexpr, RowExpr))

+ if (isDuplicate)
+ continue;
+
+ result = lappend(result, values);

I can see you copied this style from the existing code, but how about
writing this simply as:

if (!isDuplicate)
result = lappend(result, values);

-/* One value coming from some (index'th) list partition */
+/* One bound of a list partition */
typedef struct PartitionListValue
{
int index;
- Datum value;
+ Datum *values;
+ bool *isnulls;
} PartitionListValue;

Given that this is a locally-defined struct, I wonder if it makes
sense to rename the struct while we're at it. Call it, say,
PartitionListBound?

Also, please keep part of the existing comment that says that the
bound belongs to index'th partition.

Will send more comments in a bit...

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-09-01 05:37:43 Re: Possible missing segments in archiving on standby
Previous Message Fujii Masao 2021-09-01 05:13:13 Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)