Re: Multi-Column List Partitioning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-06-11 03:37:54
Message-ID: CA+HiwqFGBQgaeZWSgX_ouLc-cSsRGEMqDVZarN2-er38YyYLCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Nitin,

On Thu, Jun 3, 2021 at 11:45 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > I'll wait for you to post a new patch addressing at least the comments
> > in my earlier email. Also, please make sure to run `make check`
> > successfully before posting the patch. :)
>
> I have fixed all of the review comments given by you and Jeevan in the
> attached patch and also the attached patch contains more changes
> compared to the previous patch. Following are the implementation
> details.

Thanks for the updated version.

> 1. Regarding syntax, the existing syntax will work fine for the
> single-column list partitioning. However I have used the new syntax
> for the multi-column list partitioning as we discussed earlier. I have
> used a combination of 'AND' and 'OR' logic for the partition
> constraints as given in the below example.
>
> postgres(at)17503=#create table t(a int, b text) partition by list(a,b);
> CREATE TABLE
> postgres(at)17503=#create table t1 partition of t for values in ((1,'a'),
> (NULL,'b'));
> CREATE TABLE
> postgres(at)17503=#\d+ t
> Partitioned table "public.t"
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> a | integer | | | | plain |
> | |
> b | text | | | | extended |
> | |
> Partition key: LIST (a, b)
> Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))
>
> postgres(at)17503=#\d+ t1
> Table "public.t1"
> Column | Type | Collation | Nullable | Default | Storage |
> Compression | Stats target | Description
> --------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
> a | integer | | | | plain |
> | |
> b | text | | | | extended |
> | |
> Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
> Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
> AND (b = 'b'::text)))
> Access method: heap

The constraint expressions seem to come out correctly, though I
haven't checked your implementation closely yet.

> 2. In the existing code, NULL values were handled differently. It was
> not added to the 'datums' variable, rather used to store the partition
> index directly in the 'null_index' variable. Now there is a
> possibility of multiple NULL values, hence introducing a new member
> 'isnulls' in the 'PartitionBoundInfoData' struct which indicates
> whether the corresponding element in the 'datums' is NULL. Now
> 'null_index' cannot be used directly to store the partition index, so
> removed it and made the necessary changes in multiple places.
>
> 3. I have added test cases for 'create table' and 'insert' statements
> related to multi-column list partitioning and these are working fine
> with 'make check'.
>
> 4. Handled the partition pruning code to accommodate these changes for
> single-column list partitioning. However it is pending for
> multi-column list partitioning.
>
> 5. I have done necessary changes in partition wise join related code
> to accommodate for single-column list partitioning. However it is
> pending for multi-column list partitioning.
>
> Kindly review the patch and let me know if any changes are required.

The new list bound binary search and related comparison support
function look a bit too verbose to me. I was expecting
partition_list_bsearch() to look very much like
partition_range_datum_bsearch(), but that is not the case. The
special case code that you wrote in partition_list_bsearch() seems
unnecessary, at least in that function. I'm talking about the code
fragment starting with this comment:

/*
* Once we find the matching for the first column but if it does not
* match for the any of the other columns, then the binary search
* will not work in all the cases. We should traverse just below
* and above the mid index until we find the match or we reach the
* first mismatch.
*/

I guess you're perhaps trying to address the case where the caller
does not specify the values for all of the partition key columns,
which can happen when the partition pruning code needs to handle a set
of clauses matching only some of the partition key columns. But
that's a concern of the partition pruning code and so the special case
should be handled there (if at all), not in the binary search function
that is shared with other callers. Regarding that, I'm wondering if
we should require clauses matching all of the partition key columns to
be found for the pruning code to call the binary search, so do
something like get_matching_hash_bounds() does:

/*
* For hash partitioning we can only perform pruning based on equality
* clauses to the partition key or IS NULL clauses. We also can only
* prune if we got values for all keys.
*/
if (nvalues + bms_num_members(nullkeys) == partnatts)
{
/* code to compute matching hash bound offset */
}
else
{
/* Report all valid offsets into the boundinfo->indexes array. */
result->bound_offsets = bms_add_range(NULL, 0,
boundinfo->nindexes - 1);
}

Do you think that trying to match list partitions even with fewer keys
is worth the complexity of the implementation? That is, is the use
case to search for only a subset of partition key columns common
enough with list partitioning?

If we do decide to implement the special case, remember that to do
that efficiently, we'd need to require that the subset of matched key
columns constitutes a prefix, because of the way the datums are
sorted. That is, match all partitions when the query only contains a
clause for b when the partition key is (a, b, c), but engage the
special case of pruning if the query contains clauses for a, or for a
and b.

I will look at other parts of the patch next week hopefully. For
now, attached is a delta patch that applies on top of your v1, which
does:

* Simplify partition_list_bsearch() and partition_lbound_datum_cmp()
* Make qsort_partition_list_value_cmp simply call
partition_lbound_datum_cmp() instead of having its own logic to
compare input bounds
* Move partition_lbound_datum_cmp() into partbounds.c as a static
function (export seems unnecessary)
* Add a comment for PartitionBoundInfo.isnulls and remove that for null_index

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1_delta_amit.patch application/octet-stream 11.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-06-11 03:46:56 Re: Fix dropped object handling in pg_event_trigger_ddl_commands
Previous Message Peter Geoghegan 2021-06-11 03:16:40 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic