Re: Multi-Column List Partitioning

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-06-03 14:45:09
Message-ID: CAMm1aWYryFE2WTN3xT_SMuGkJjU6HdnHSMM5YgrxV8U365+4_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Approach 1 sounds better. It sounds like approach 1 might help us
> implement support for allowing NULLs in range partition bounds in the
> future, if at all. For now, it might be better to not allocate the
> isnull array except for list partitioning.

Thanks for confirming.

> I'll wait for you to post a new patch addressing at least the comments
> in my earlier email. Also, please make sure to run `make check`
> successfully before posting the patch. :)

I have fixed all of the review comments given by you and Jeevan in the
attached patch and also the attached patch contains more changes
compared to the previous patch. Following are the implementation
details.

1. Regarding syntax, the existing syntax will work fine for the
single-column list partitioning. However I have used the new syntax
for the multi-column list partitioning as we discussed earlier. I have
used a combination of 'AND' and 'OR' logic for the partition
constraints as given in the below example.

postgres(at)17503=#create table t(a int, b text) partition by list(a,b);
CREATE TABLE
postgres(at)17503=#create table t1 partition of t for values in ((1,'a'),
(NULL,'b'));
CREATE TABLE
postgres(at)17503=#\d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition key: LIST (a, b)
Partitions: t1 FOR VALUES IN ((1, 'a'), (NULL, 'b'))

postgres(at)17503=#\d+ t1
Table "public.t1"
Column | Type | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
--------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain |
| |
b | text | | | | extended |
| |
Partition of: t FOR VALUES IN ((1, 'a'), (NULL, 'b'))
Partition constraint: (((a = 1) AND (b = 'a'::text)) OR ((a IS NULL)
AND (b = 'b'::text)))
Access method: heap

2. In the existing code, NULL values were handled differently. It was
not added to the 'datums' variable, rather used to store the partition
index directly in the 'null_index' variable. Now there is a
possibility of multiple NULL values, hence introducing a new member
'isnulls' in the 'PartitionBoundInfoData' struct which indicates
whether the corresponding element in the 'datums' is NULL. Now
'null_index' cannot be used directly to store the partition index, so
removed it and made the necessary changes in multiple places.

3. I have added test cases for 'create table' and 'insert' statements
related to multi-column list partitioning and these are working fine
with 'make check'.

4. Handled the partition pruning code to accommodate these changes for
single-column list partitioning. However it is pending for
multi-column list partitioning.

5. I have done necessary changes in partition wise join related code
to accommodate for single-column list partitioning. However it is
pending for multi-column list partitioning.

Kindly review the patch and let me know if any changes are required.

Pending items:
1. Support of partition pruning for multi-column list partitioning.
2. Support of partition wise join for multi-column list partitioning.

I will continue to work on the above 2 items.
Kindly let me know if I am missing something.

Thanks & Regards,
Nitin Jadhav

On Wed, May 26, 2021 at 10:27 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
> <nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > > IMO, it is not such a bad syntax from a user's PoV. It's not hard to
> > > understand from this syntax that the partition constraint is something
> > > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
> > > row-wise comparison.
> >
> > Thanks for suggesting to use row-wise comparison.
>
> Actually, I was just describing how the *users* may want to visualize
> the partition constraint...
>
> > I have few queries
> > with respect to handling of NULL values.
> >
> > 1. What should be the partition constraint for the above case. AFAIK,
> > row-wise comparison wont work with NULL values as shown in [1]. I mean
> > two rows are considered equal if all their corresponding members are
> > non-null and equal. The rows are unequal if any corresponding members
> > are non-null and unequal. Otherwise the result of the row comparison
> > is unknown (null). So we should generate different types of
> > constraints for NULL values.
> >
> > Ex:
> > CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
> > CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
> > (NULL, 1), (NULL, NULL);
> >
> > As per my knowledge, we should consider creating partition constraints
> > for the above example as given below.
> >
> > (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
> > 1)) OR ((a is NULL) AND (b is NULL)).
>
> Yeah, something like that should do the trick.
>
> Again, I was not actually suggesting that you write code to implement
> the constraint using something like RowCompareExpr, only that the
> users might want to view the constraint as doing row-wise comparison
> of the partitioning columns and the specified value lists.
>
> > 2. In the current code we don't put the NULL value in the 'datums'
> > field of 'PartitionBoundInfoData' structure [2]. Since there can be
> > only one NULL value, we directly store the corresponding index value
> > in the 'null_index' field. Now we have to handle multiple NULL values
> > in case of Multi-Column List Partitioning. So the question is how to
> > handle this scenario. Following are the 2 approaches to handle this.
> >
> > Approach-1:
> > Add another field 'bool **isnull' in [2] and mark the corresponding
> > element to TRUE if it has NULL value and the corresponding location in
> > 'datums' contains empty/No value. For example, If a partition bound is
> > (1, NULL), then
> >
> > datums[0][0] = 1
> > datums[0][1] = Not assigned any value
> > isnull[0][0] = FALSE
> > is null[0][1] = TRUE
> >
> > So now we have an entry in the 'datums' field for a bound containing
> > NULL value, so we should handle this in all the scenarios where we are
> > manipulating 'datums' in order to support NULL values and avoid crash.
> >
> > Approach-2:
> > Don't add the bound information to 'datums' field of [2] if any of the
> > value is NULL. Store this information separately in the structures
> > mentioned in [3] and process accordingly.
> >
> > I feel approach-1 is the better solution as this requires less code
> > changes and easy to implement than approach-2. Kindly share your
> > thoughts about the approaches and please share if you have any better
> > solution than the above 2.
>
> Approach 1 sounds better. It sounds like approach 1 might help us
> implement support for allowing NULLs in range partition bounds in the
> future, if at all. For now, it might be better to not allocate the
> isnull array except for list partitioning.
>
> I'll wait for you to post a new patch addressing at least the comments
> in my earlier email. Also, please make sure to run `make check`
> successfully before posting the patch. :)
>
> Thanks.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1_multi_column_list_partitioning.patch application/octet-stream 65.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-06-03 15:33:21 Re: speed up verifying UTF-8
Previous Message Greg Stark 2021-06-03 14:41:41 Re: speed up verifying UTF-8