Re: Multi-Column List Partitioning

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-05-26 04:57:04
Message-ID: CA+HiwqHKBFpCH+j0LET9mZjxtntZ87rrzw8KZDGw8ZHU=Rb8pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 23, 2021 at 6:49 PM Nitin Jadhav
<nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > IMO, it is not such a bad syntax from a user's PoV. It's not hard to
> > understand from this syntax that the partition constraint is something
> > like (a, b) = (1, 2) OR (a, b) = (1, 5) OR ..., where the = performs
> > row-wise comparison.
>
> Thanks for suggesting to use row-wise comparison.

Actually, I was just describing how the *users* may want to visualize
the partition constraint...

> I have few queries
> with respect to handling of NULL values.
>
> 1. What should be the partition constraint for the above case. AFAIK,
> row-wise comparison wont work with NULL values as shown in [1]. I mean
> two rows are considered equal if all their corresponding members are
> non-null and equal. The rows are unequal if any corresponding members
> are non-null and unequal. Otherwise the result of the row comparison
> is unknown (null). So we should generate different types of
> constraints for NULL values.
>
> Ex:
> CREATE TABLE t(a int, b int) PARTITION BY LIST(a,b);
> CREATE TABLE t_1 PARTITION OF t FOR VALUES IN (1, 1), (1, NULL),
> (NULL, 1), (NULL, NULL);
>
> As per my knowledge, we should consider creating partition constraints
> for the above example as given below.
>
> (a, b) = (1, 1) OR ((a = 1) AND (b IS NULL)) OR ((a IS NULL) AND (b =
> 1)) OR ((a is NULL) AND (b is NULL)).

Yeah, something like that should do the trick.

Again, I was not actually suggesting that you write code to implement
the constraint using something like RowCompareExpr, only that the
users might want to view the constraint as doing row-wise comparison
of the partitioning columns and the specified value lists.

> 2. In the current code we don't put the NULL value in the 'datums'
> field of 'PartitionBoundInfoData' structure [2]. Since there can be
> only one NULL value, we directly store the corresponding index value
> in the 'null_index' field. Now we have to handle multiple NULL values
> in case of Multi-Column List Partitioning. So the question is how to
> handle this scenario. Following are the 2 approaches to handle this.
>
> Approach-1:
> Add another field 'bool **isnull' in [2] and mark the corresponding
> element to TRUE if it has NULL value and the corresponding location in
> 'datums' contains empty/No value. For example, If a partition bound is
> (1, NULL), then
>
> datums[0][0] = 1
> datums[0][1] = Not assigned any value
> isnull[0][0] = FALSE
> is null[0][1] = TRUE
>
> So now we have an entry in the 'datums' field for a bound containing
> NULL value, so we should handle this in all the scenarios where we are
> manipulating 'datums' in order to support NULL values and avoid crash.
>
> Approach-2:
> Don't add the bound information to 'datums' field of [2] if any of the
> value is NULL. Store this information separately in the structures
> mentioned in [3] and process accordingly.
>
> I feel approach-1 is the better solution as this requires less code
> changes and easy to implement than approach-2. Kindly share your
> thoughts about the approaches and please share if you have any better
> solution than the above 2.

Approach 1 sounds better. It sounds like approach 1 might help us
implement support for allowing NULLs in range partition bounds in the
future, if at all. For now, it might be better to not allocate the
isnull array except for list partitioning.

I'll wait for you to post a new patch addressing at least the comments
in my earlier email. Also, please make sure to run `make check`
successfully before posting the patch. :)

Thanks.

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2021-05-26 05:14:47 Re: storing an explicit nonce
Previous Message David Rowley 2021-05-26 03:43:54 Re: Hybrid Hash/Nested Loop joins and caching results from subplans