Re: Declarative partitioning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning
Date: 2015-12-22 01:51:56
Message-ID: 5678ACBC.8050809@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015/12/18 3:56, Robert Haas wrote:
> On Mon, Dec 14, 2015 at 2:14 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Syntax to create a partitioned table (up to 2 levels of partitioning):
>>
>> CREATE TABLE foo (
>> ...
>> )
>> PARTITION BY R/L ON (key0)
>> SUBPARTITION BY R/L ON (key1)
>> [(PARTITION foo_1 FOR VALUES <val> [<storage_params>] [<tblspc>]
>> [(SUBPARTITION foo_1_1 FOR VALUES <val> [<storage_params>] [<tblspc>],
>> ...)], ...)];
>>
>> The above creates two pg_partitioned_rel entries for foo with partlevel 0
>> and 1, for key0 and key1, respectively. For foo_1 and foo_1_1, this
>> creates pg_partition entries, with foo and foo_1 as partparent,
>> respectively.
>>
>> Why just 2 levels? - it seems commonplace and makes the syntax more
>> intuitive? I guess it might be possible to generalize the syntax for
>> multi-level partitioning. Ideas? If we want to support the notion of
>> sub-partition template in future, that would require some thought, more
>> importantly proper catalog organization for the same.
>
> I do not think this is a particularly good idea. You're going to need
> to dump each partition separately at least in --binary-upgrade mode,
> because each is going to have its own magic OIDs that need to be
> restored, and also because there will most likely be at least some
> properties that are going to vary between partitions. You could
> require that every partition have exactly the same set of columns,
> constraints, rules, triggers, policies, attribute defaults, comments,
> column comments, and everything else that might be different from one
> partition to another, and further require that they have exactly
> matching indexes. It would take a fair amount of code to prohibit all
> that, but it could be done. However, do we really want that? There
> may well be some things were we want to enforce that the parent and
> the child are exactly identical, but I doubt we want that for
> absolutely every property, current and future, of the partition. And
> even if you did, because of the --binary-upgrade stuff, you still need
> to to be able to dump them separately.
>
> Therefore, I believe it is a whole lot better to make the primary
> syntax for table partitioning something where you issue a CREATE
> statement for the parent and then a CREATE statement for each child.
> If we want to also have a convenience syntax so that people who want
> to create a parent and a bunch of children in one fell swoop can do
> so, fine.

Regarding --binary-upgrade dump mode, how about we teach pg_dump to dump
each partition separately using ALTER TABLE parent ADD PARTITION
especially for the "magic OIDs" reason? It may very well be a CREATE
PARTITION-style command though. Note that each such command could specify
properties that can be different per partition. I said in my email,
perhaps not so clearly, that "only" WITH options, tablespace and
relpersistence can be different per partition. But I can see why that may
be severely restrictive at this stage.

By the way, what do you think about SUBPARTITION keyword-based syntax for
multi-level partitioning? Should we instead require that each partition
has its own PARTITION BY in its creation command?

>
> I would not choose to model the syntax for creating partitions on
> Oracle. I don't find that syntax particularly nice or easy to
> remember. I say PARTITION BY RANGE, and then inside the parentheses I
> use the PARTITION keyword for each partition? Really? But I think
> copying the style while having the details be incompatible is an even
> worse idea.

As for the convenience syntax (if at all), how about:

CREATE TABLE foo (
...
)
PARTITION BY ... ON (...)
SUBPARTITION BY ... ON (...)
opt_partition_list;

where opt_partition_list is:

PARTITIONS (
partname FOR VALUES ... [WITH] [TABLESPACE] opt_subpart_list
[, ...]
)

where opt_subpart_list is:

SUBPARTITIONS (
subpartname FOR VALUES ... [WITH] [ TABLESPACE]
[, ...]
)

PARTITIONS, SUBPARTITIONS would be new unreserved keywords. Or we can do
away with them.

>> What about ALTER TABLE? - Instead of allowing ALTER TABLE to be applied
>> directly to partitions on case-by-case basis (they are tables under the
>> hood after all), we should restrict AT to the master table. Most of the AT
>> changes implicitly propagate from the master table to its partitions. Some
>> of them could be directly applied to partitions and/or sub-partitions such
>> as rename, storage manipulations like - changing tablespace, storage
>> parameters (reloptions), etc.:
>>
>> ALTER TABLE foo
>> RENAME PARTITION <partition-name> TO <new-name>;
>>
>> ALTER TABLE foo
>> RENAME SUBPARTITION <sub-partition-name> TO <new-name>;
>>
>> ALTER TABLE foo
>> SET TABLESPACE ... [DEFAULT] FOR PARTITION <partition-name>;
>>
>> ALTER TABLE foo
>> SET TABLESPACE ... FOR SUBPARTITION <sub-partition-name>;
>>
>> ALTER TABLE foo
>> SET (storage_parameter = value) [DEFAULT] FOR PARTITION <partition-name>;
>>
>> ALTER TABLE foo
>> SET (storage_parameter = value) FOR SUBPARTITION <sub-partition-name>;
>
> I don't think this is a very good idea. This is basically proposing
> that for every DDL command that you can apply to a table, you have to
> spell it differently for a partition. That seems like a lot of extra
> work for no additional functionality.

I didn't mean to propose the special syntax for every available DDL
command. Just those that modify properties that could be different per
partition. I thought there would be very few such properties and hence not
a lot of special commands. But then again, I may be wrong about what
those properties are.

>
>> By the way, should we also allow changing the logging of
>> partitions/sub-partitions as follows?
>
> Again, I think you're coming at this from the wrong direction.
> Instead of saying we're going to disallow all changes to the
> partitions and then deciding we need to allow certain changes after
> all, I think we should allow everything that is currently allowed for
> an inherited table and then decide which of those things we need to
> prohibit, and why. For example, if you insist that a child table has
> to have a tuple descriptor that matches the parent, that can improve
> efficiency: Append won't need to project, and so on. But it now
> becomes very difficult to support taking a stand-alone table and
> making it a partition of an existing partitioned table, because the
> set of dropped columns might not match. Having to give an error in
> that case amounts to "we're sorry, we can't attach your partition to
> the partitioning hierarchy because of some invisible state that you
> can't see" isn't very nice. Now I'm not saying that isn't the right
> decision, but I think the design choices here need to be carefully
> thought about.

Yeah, I am concerned about the ATTACH PARTITION USING TABLE case for the
very point you mention. And I can see how it may result from the
restrictive model I propose. FWIW, other databases impose a number of
restrictions on the partition roll-in case but not sure if for the
internal reasons we might want to.

>
> Stepping away from that particular example, a blanket prohibition on
> changing any attribute of a child table seems like it will prohibit a
> lot of useful things that really ought to work. And again, I don't
> think it's a good idea to implement separate syntax for changing a
> partition vs. changing a table. If I want to set a partition as
> unlogged, I should be able to say ALTER TABLE partition_name UNLOGGED
> or maybe ALTER PARTITION partition_name UNLOGGED, not be forced to use
> some new grammar production that looks completely different.

Okay. ALTER PARTITION may be the way to go.

>
>> What about index constraints, ie, PRIMARY KEY, UNIQUE and EXCLUSION
>> constraints - 2 things must be clear here: cannot create these constraints
>> on individual partitions and all partition columns (key0 + key1) must be
>> the leading columns of the key. On a related note, creating index on the
>> master table should create the index on all "leaf" partitions. The index
>> on the mater table itself would be just a logical index. Should we allow
>> creating or dropping indexes on partitions directly?
>
> I don't find this to be particularly clear. You are assuming that
> nobody wants to create a constraint that a certain value is unique
> within a partition. That might not be a real common thing to want to
> do, but it could certainly be useful to somebody, and the current
> system with table inheritance allows it. For example, suppose that we
> have orders partitioned on the order_date column, by month. The user
> might want to create a UNIQUE index on order_id on each partition.
> Maybe they start over with order_id 1 at the beginning of each month.
> But even if, as is more likely, the order IDs keep counting up from
> month to month, they don't want to be forced to include the whole
> partitioning key in the index in order to have it marked UNIQUE. That
> may be enough, in practice, to ensure the global uniqueness of order
> IDs even though the system doesn't technically enforce it in all
> cases.

Okay, I didn't consider that one may want to create UNIQUE constraint per
partition. Do they need to be allowed to be different per partition?

Sorry, I could not understand your last sentence regarding global
uniqueness. Given the restrictions on its definition, in what ways could
the system fail to enforce it?

>
> If you want an index created on the parent to cascade down to all
> children, that's a big project to get right. Suppose I create an
> index on the parent table. After a while, I notice that it's getting
> bloated, so I created another index with the same definition. Now, I
> drop one of the two indexes. One of the two indexes from each child
> table needs to go away, and moreover it can't be picked arbitrarily -
> it has to be the one that was created at the same time as the parent
> index I'm dropping. If you want it to behave like this, you need a
> whole system of bookkeeping to make it work right.
>
> For version 1, I'd go the other way and prohibit index definitions on
> the empty parent rels. Let people create indexes on the children as
> they wish. In a future version, we can add stuff to cascade from
> parents to children.

Okay, I may be missing the subtleties of global uniqueness case. I agree
we can leave it out of version 1.

>
>> It would be interesting to talk about features like SPLIT, MERGE, EXCHANGE
>> but I'm inclined to leave them as future enhancements. For a functionality
>> similar to EXCHANGE, there are commands like ATTACH/DETACH in the latest
>> patch. We could extend them to also consider sub-partitions:
>
> We don't need to have these in the first version, but we have to make
> some architectural decisions that affect how feasible they are to
> implement and in which cases, as noted above.
>
>> One cannot define rules, triggers, and RLS policies on them. Although,
>> AR triggers defined on a partitioned master table are propagated to the
>> "leaf" partitions.
>
> What value do you see us getting out of restricting these particular things?

Perhaps, I'm wrong in thinking that we should limit these to be associated
with only the top-level partitioned table.

Thanks a lot for the comments!

Regards,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-12-22 02:27:24 Re: SET SESSION AUTHORIZATION superuser limitation.
Previous Message Craig Ringer 2015-12-22 01:49:44 Re: Experimental evaluation of PostgreSQL's query optimizer