Re: Proposal: Automatic partition creation

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Automatic partition creation
Date: 2020-07-06 14:59:47
Message-ID: 20200706145947.GX4107@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 06, 2020 at 01:45:52PM +0300, Anastasia Lubennikova wrote:
> The previous discussion of automatic partition creation [1] has addressed
> static and dynamic creation of partitions and ended up with several syntax
> proposals.
...
> where partition_auto_create_clause is
>
> CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec

> - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet
> I wonder, is it worth placing a stub for dynamic partitioning, or we can
> rather add these keywords later.

I understand by "deferred" you mean that the partition isn't created at the
time CREATE TABLE is run but rather deferred until needed by INSERT.

For deferred, range partitioned tables, I think maybe what you'd want to
specify (and store) is the INTERVAL. If the table is partitioned by day, then
we'd date_trunc('day', time) and dynamically create that day. But if it was
partitioned by month, we'd create the month. I think you'd want to have an
ALTER command for that (we would use that to change tables between
daily/monthly based on their current size). That should also support setting
the MODULUS of a HASH partitioned table, to allow changing the size of its
partitions (currently, the user would have to more or less recreate the table
and move all its data into different partitions, but that's not ideal).

I don't know if it's important for anyone, but it would be interesting to think
about supporting sub-partitioning: partitions which are themselvese partitioned.
Like something => something_YYYY => something_YYYY_MM => something_YYYY_MM_DD.
You'd need to specify how to partition each layer of the heirarchy. In the
most general case, it could be different partition strategy.

If you have a callback function for partition renaming, I think you'd want to
pass it not just the current name of the partition, but also the "VALUES" used
in partition creation. Like (2020-04-05)TO(2020-05-06). Maybe instead, we'd
allow setting a "format" to use to construct the partition name. Like
"child.foo_bar_%Y_%m_%d". Ideally, the formats would be fixed-length
(zero-padded, etc), so failures with length can happen at "parse" time of the
statement and not at "run" time of the creation. You'd still have to handle
the case that the name already exists but isn't a partition (or is a partition
by doesn't handle the incoming tuple for some reason).

Also, maybe your "configuration" syntax would allow specifying other values.
Maybe including a retention period (as an INTERVAL for RANGE tables). That's
useful if you had a command to PRUNE the oldest partitions, like ALTER..PRUNE.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-06 15:05:46 Re: Ideas about a better API for postgres_fdw remote estimates
Previous Message Alvaro Herrera 2020-07-06 14:56:53 Re: Cache lookup errors with functions manipulation object addresses