Re: [PATCH] Automatic HASH and LIST partition creation

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amul Sul <sulamul(at)gmail(dot)com>
Subject: Re: [PATCH] Automatic HASH and LIST partition creation
Date: 2020-09-14 11:38:56
Message-ID: 74a716a3-986d-c737-6217-6a3ffd488da3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08.09.2020 17:03, Pavel Borisov wrote:
>
> The patch lacks documentation, because I expect some details may
> change during discussion. Other than that, the feature is ready
> for review.
>
> Hi, hackers!
>
> From what I've read I see there is much interest in automatic
> partitions creation. (Overall discussion on the topic is partitioned
> into two threads: (1)
> https://www.postgresql.org/message-id/alpine.DEB.2.21.1907150711080.22273%40lancre and
> (2)
> https://www.postgresql.org/message-id/flat/7fec3abb-c663-c0d2-8452-a46141be6d4a(at)postgrespro(dot)ru
> (current thread) )
>
> There were many syntax proposals and finally, there is a patch
> realizing one of them. So I'd like to review it.
>
> The syntax proposed in the patch seems good enough for me and is in
> accordance with one of the proposals in the discussion. Maybe I'd
> prefer using the word AUTOMATICALLY/AUTO instead of CONFIGURATION with
> explicit meaning that using this syntax we'd get already
> (automatically) created partitions and don't need to create them
> manually, as in the existing state of postgresql declarative
> partitioning.
>
> CREATE TABLE tbl (iint) PARTITION BY HASH (i) AUTOMATICALLY (MODULUS 3); (partitions are created automatically)
> vs
> CREATE TABLE tbl (iint) PARTITION BY HASH (i); (partitions should be created manually by use of PARTITION OF)
> CREATE TABLE tbl (i char) PARTITION BY LIST (i) AUTOMATICALLY (VALUES
> IN ('a', 'b'), ('c', 'd'), ('e','f') DEFAULTPARTITION tbl_default);
> vs
> CREATE TABLE tbl (ichar) PARTITION BY LIST (i); (partitions should be created manually by use of PARTITION OF)
>
> I think this syntax can also be extended later with adding automatic
> creation of RANGE partitions, with IMMEDIATE/DEFERRED for
> dynamic/on-demand automatic partition creation, and with SUBPARTITION
> possibility.
>
> But I don't have a strong preference for the word AUTOMATICALLY,
> moreover I saw opposition to using AUTO at the top of the discussion.
> I suppose we can go with the existing CONFIGURATION word.

I agree that 'AUTOMATICALLY' keyword is more specific and probably less
confusing for users. I've picked 'CONFIGURATION' simply because it is an
already existing keyword. It would like to hear other opinions on that.

> If compare with existing declarative partitions, I think automatic
> creation simplifies the process for the end-user and  I'd vote for its
> committing into Postgres. The patch is short and clean in code style.
> It has enough comments Tests covering the new functionality are
> included. Yet it doesn't have documentation and I'd suppose it's worth
> adding it. Even if there will be syntax changes, I hope they will not
> be more than the replacement of several words. Current syntax is
> described in the text of a patch.
>

Fair enough. New patch contains a documentation draft. While writing it,
I also noticed, that the syntax, introduced in the patch differs from
greenpulm one. For now, list partitioning clause doesn't support
'PARTITION name' part, that is supported in greenplum. I don't think
that we aim for 100% compatibility here. Still, the ability to provide
table names is probably a good optional feature, especially for list
partitions.

What do you think?

> The patch applies cleanly and installcheck-world is passed.
>
> Some minor things:
>
> I've got a compiler warning:
> parse_utilcmd.c:4280:15: warning: unused variable 'lc' [-Wunused-variable]

Fixed. This was also caught by cfbot. This version should pass it clean.

>
> When the number of partitions is over the maximum value of int32 the
> output shows a generic syntax error. I don't think it is very
> important as it is not the case someone will make deliberately, but
> maybe it's better to output something like "Partitions number is more
> than the maximum supported value"
> create table test (i int, t text) partition by hash (i) configuration
> (modulus 888888888888);
> ERROR:  syntax error at or near "888888888888"

This value is not a valid int32 number, thus parser throws the error
before we have a chance to handle it more gracefully.

>
> One more piece of nitpicking. Probably we can go just with a mention
> in documentation.
> create table test (i int, t text) partition by hash (i) configuration
> (modulus 8888);
> ERROR:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction.
>
Well, it looks like a legit error, when we try to lock a lot of objects
in one transaction. I will double check if we don't release a lock
somewhere.

Do we need to restrict the number of partitions, that can be created by
this statement? With what number?  As far as I see, there is no such
restriction for now, just a recommendation about performance issues.
With automatic creation it becomes easier to mess with it.

Probably, it's enough to mention it in documentation and rely on users
common sense.

> Typo:
> + /* Add statemets to create each partition after we create parent
> table */
>
Fixed.

> Overall I see the patch almost ready for commit and I'd like to meet
> this functionality in v14.
I also hope that this patch will make it to v14, but for now, I don't
see a consensus on the syntax and some details, so I wouldn't rush.

Besides, it definitely needs more testing. I haven't thoroughly tested
following cases yet:
- how triggers and constraints are propagated to partitions;
- how does it handle some tricky clauses in list partitioning expr_list;
and so on.

Also, there is an open question about partition naming. Currently, the
patch implements dummy %tbl_%partnum name generation, which is far from
user-friendly. I think we must provide some hook or trigger function to
rename partitions after they were created.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
auto_part_hash_list_v1.patch text/x-patch 24.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2020-09-14 11:49:38 Re: Avoid incorrect allocation in buildIndexArray
Previous Message Dilip Kumar 2020-09-14 11:37:38 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions