Quick Links

Re: Declarative partitioning - another take

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Declarative partitioning - another take
Date:	2016-08-10 18:22:52
Message-ID:	CA+TgmoZ008qTgd_Qg6_oZb3i0mOYrS6MdhncwgcqPKahixjarg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 10, 2016 at 7:09 AM, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Attached is the latest set of patches to implement declarative
> partitioning.

Cool. I would encourage you to give some thought to what is the least
committable subset of these patches, and think about whether it can be
reorganized to make that smaller. Based on the descriptions, it
sounds to me like the least committable subset is probably all of 0001
- 0005 as one giant commit, and that's a lot of code. Maybe there's
no real way to come up with something more compact than that, but it's
worth some thought.

> 0001-Catalog-and-DDL-for-partition-key.patch

+ <entry><link
linkend="catalog-pg-partitioned"><structname>pg_partitioned</structname></link></entry>

pg_partitioned seems like a slightly strange choice of name. We have
no other catalog tables whose names end in "ed", so the use of a past
participle here is novel. More generally, this patch seems like it's
suffering a bit of terminological confusion: while the catalog table
is called pg_partitioned, the relkind is RELKIND_PARTITION_REL. And
get_relation_by_qualified_name() thinks that RELKIND_PARTITION_REL is
a "table", but getRelationDescription thinks it's a "partitioned
table". I think we need to get all these bits and pieces on the same
page, or have some kind of consistent way of deciding what to do in
each case. Maybe pg_partitioned_table, RELKIND_PARTITIONED_TABLE,
etc. for the internal stuff, but just "table" in user-facing messages.

Alternatively, I wonder if we should switch to calling this a
"partition root" rather than a "partitioned table". It's not the
whole table, just the root. And doesn't contain data - in fact it
probably shouldn't even have storage - so calling it a "table" might
be confusing. But if we're using CREATE TABLE to create it in the
ifrst place, then calling it something other than a table later on is
also confusing. Hmm.

+ <entry><structfield>partexprs</structfield></entry>

There's a certain symmetry between this and what we do for indexes,
but I'm wondering whether there's a use case for partitioning a table
by an expression rather than a column value. I suppose if you've
already done the work, there's no harm in supporting it.

+[ PARTITION BY {RANGE | LIST} ( { <replaceable
class="parameter">column_name</replaceable> | ( <replaceable
class="parameter">expression</replaceable> ) } [ <replaceable
class="parameter">opclass</replaceable> ] [, ...] )

The spacing isn't right here. Should say { RANGE | LIST } rather than
{RANGE|LIST}. Similarly elsewhere.

+ thus created is called <firstterm>partitioned</firstterm> table. Key
+ consists of an ordered list of column names and/or expressions when
+ using the <literal>RANGE</> method, whereas only a single column or
+ expression can be specified when using the <literal>LIST</> method.

Why do we support range partitioning with multiple columns, but list
partitioning only with a single column?

+ The type of a key column or an expresion must have an associated

Typo.

+ <para>
+ Currently, there are following limitations on definition of partitioned
+ tables: one cannot specify any UNIQUE, PRIMARY KEY, EXCLUDE and/or
+ FOREIGN KEY constraints.
+ </para>

But I presume you can define these constraints on the partitions;
that's probably worth mentioning. I'd say something like this
"Partitioned tables do not support UNIQUE, PRIMARY, EXCLUDE, or
FOREIGN KEY constraints; however, you can define these constraints on
individual data partitions."

+ if (partexprs)
+ recordDependencyOnSingleRelExpr(&myself,
+ (Node *) partexprs,
+ RelationGetRelid(rel),
+ DEPENDENCY_NORMAL,
+ DEPENDENCY_IGNORE);

I don't think introducing a new DEPENDENCY_IGNORE type is a good idea
here. Instead, you could just add an additional Boolean argument to
recordDependencyOnSingleRelExpr. That seems less likely to create
bugs in unrelated portions of the code.

+ /*
+ * Override specified inheritance option, if relation is a partitioned
+ * table and it's a target of INSERT.
+ */
+ if (alsoSource)
+ rte->inh |= relid_is_partitioned(rte->relid);

This seems extremely unlikely to be the right way to do this. We're
just doing parse analysis here, so depending on properties of the
table that are (or might be made to be) changeable is not good. It's
also an extra catalog lookup whether the table is partitioned or not
(and whether rte->inh is already true or not). Furthermore, it seems
to go against the comment explaining what rte->inh is supposed to
mean, which says:

* inh is TRUE for relation references that should be expanded to include
* inheritance children, if the rel has any. This *must* be FALSE for
* RTEs other than RTE_RELATION entries.

I am not sure what problem you're trying to solve here, but I suspect
this needs to be ripped out altogether or handled later, during
planning. Similarly for the related change in addRangeTableEntry.

+ {PartitionedRelationId, /* PARTEDRELID */
+ PartitionedRelidIndexId,
+ 1,
+ {
+ Anum_pg_partitioned_partedrelid,
+ 0,
+ 0,
+ 0
+ },
+ 128

I'd probably cut the initial size of this down a bit. Odds are good
that not all of the tables you access will be partitioned. Of course
that assumes we'll avoid probing this syscache for non-partitioned
tables, but I think that's pretty important.

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("cannot alter column named in partition key")));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("cannot alter column referenced in
partition key expression")));

Maybe "cannot alter type of column named in partition key"? And
similarly for the other one.

+/*
+ * transformPartitionBy
+ * Transform any expressions present in the partition key
+ */
+static PartitionBy *
+transformPartitionBy(Relation rel, PartitionBy *partitionby)

Shouldn't this be in src/backend/parser/parse_*.c instead of tablecmds.c?

Most of this (0001) looks pretty reasonable. I'm sure that there is
more that needs fixing than what I've mentioned above, which is just
what I saw on an initial read-through, but overall I like the
approach.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Declarative partitioning - another take at 2016-08-10 11:09:42 from Amit Langote

Responses

Re: Declarative partitioning - another take at 2016-08-15 11:21:49 from Amit Langote

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2016-08-10 18:24:35	Re: pg_ctl promote wait
Previous Message	Tomas Vondra	2016-08-10 18:09:14	Re: multivariate statistics (v19)