Re: Declarative partitioning - another take

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-10-06 07:14:04
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2016/10/05 2:12, Robert Haas wrote:
> On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Even if we leave the empty relfilenode around for now -- in the long
>>> run I think it should die -- I think we should prohibit the creation
>>> of subsidiary object on the parent which is only sensible if it has
>>> rows - e.g. indexes. It makes no sense to disallow non-inheritable
>>> constraints while allowing indexes, and it could box us into a corner
>>> later.
>> I agree. So we must prevent from the get-go the creation of following
>> objects on parent tables (aka RELKIND_PARTITIONED_TABLE relations):
>> * Indexes
>> * Row triggers (?)
> Hmm, do we ever fire triggers on the parent for operations on a child
> table? Note this thread, which seems possibly relevant:

The answer to your question is no.

The thread you quoted discusses statement-level triggers and the
conclusion is that they don't work as desired for UPDATE and DELETE on
inheritance tables. As things stand, only UPDATE or DELETE on the parent
affects the child tables and it's proposed there that the statement-level
triggers on the parent and also on any child tables affected should be
fired in that case.

Currently, INSERT doesn't have that problem because it only ever affects
the parent table. That changes with tuple routing though. I am not sure
if the aforementioned proposed behavior should be applied in this case.
That is to say, if we have INSERT INTO parent, only the parent's
statement-level triggers should be fired. The partition into which the
tuple is routed will have only its row-level triggers fired.

The proposed TODO item has been left untouched for many years now, so it
is perhaps not such a big pain point after all (or perhaps some standard
noncompliance). I may be wrong though.

> We certainly won't fire the parent's per-row triggers ever, so those
> should be prohibited. But I'm not sure about per-statement triggers.

Agree about the row-level triggers. I think per-statement triggers are
alright to allows on parent tables from the POV of the original topic we
were discussing - ie, per-statement triggers do not depend on the physical
addressing capabilities of the relation on which they are defined whereas
row-levels triggers do in some cases.

So I changed 0001 so that indexes and row triggers are disallowed on
partitioned tables.

>> In addition to preventing creation of these objects, we must also teach
>> commands that directly invoke heapam.c to skip such relations.
> I'm don't think that's required if we're not actually getting rid of
> the relfilenode.

Oops that line got sent after all, I had convinced myself to remove that
line from the email. :) Agreed, no need to do that just yet.

>> Needless to say, a query-specified qualification clause must now include
>> the same collate clause as used for individual partition columns (or
>> expressions) for the constraint exclusion to work as intended.
> Hopefully this is only required if the default choice of collation
> wouldn't be correct anyway.

Sorry, I'm not sure I understand what you said - specifically what would
"wrong" mean in this case? Parser error? Run-time error like in varstr_cmp()?

What I meant to say is that predicate OpExpr's generated by the
partitioning code will be using the corresponding partition column's
partcollation as its inputcollid (and the Var will be wrapped by a
RelabelType). Here partcollation is a known valid alternative collation
that is applied when partitioning the input data, overriding any default
collation of the column's type or the collation specified for the column
when creating the table. During assign_query_collations(), a qual clause
OpExpr's variable will get assigned a collation that is either the
column/expressions's attcollation/exprCollationor explicitly specified
collation using a COLLATE clause. If variable collation assigned thusly
doesn't match the partitioning collation, then the predicate OpExpr's and
the query clause OpExpr's variable expressions will fail to match using
equal(), so constraint exclusion (specifically, operator_predicate_proof()
bails out) will fail. Of course, going ahead with constraint exclusion
ignoring the said collation mismatch would be wrong anyway.

> Reviewing 0002:
> + prettyFlags = PRETTYFLAG_INDENT;
> + PG_RETURN_TEXT_P(string_to_text(pg_get_partkeydef_worker(relid,
> + prettyFlags)));
> Why bother with the variable?

Leftover, removed.

> + /* Must get partclass, and partexprs the hard way */
> + datum = SysCacheGetAttr(PARTEDRELID, tuple,
> + Anum_pg_partitioned_table_partclass, &isnull);
> + Assert(!isnull);
> + partclass = (oidvector *) DatumGetPointer(datum);
> Comment mentions getting two things, but code only gets one.

Fixed. Further code uses the "hard way" to get partexprs but it's got a
separate comment anyway.

> + partexprs = (List *) stringToNode(exprsString);
> if (!IsA(partexprs, List)) elog(ERROR, ...); so as to guard against
> corrupt catalog contents.

Done as follows:

partexprs = (List *) stringToNode(exprsString);

if (!IsA(partexprs, List))
elog(ERROR, "unexpected node type found in partexprs: %d",
(int) nodeTag(partexprs));

> + switch (form->partstrat)
> + {
> + case 'l':
> + appendStringInfo(&buf, "LIST");
> + break;
> + case 'r':
> + appendStringInfo(&buf, "RANGE");
> + break;
> + }
> default: elog(ERROR, "unexpected partition strategy: %d", (int)
> form->partstrat);
> Also, why not case PARTITION_STRATEGY_LIST: instead of case 'l': and
> similarly for 'r'?

Done and done.

> @@ -5296,7 +5301,8 @@ getTables(Archive *fout, int *numTables)
> "OR %s IS NOT NULL "
> "))"
> - "AS changed_acl "
> + "AS changed_acl, "
> + "CASE WHEN c.relkind = 'P' THEN
> pg_catalog.pg_get_partkeydef(c.oid) ELSE NULL END AS partkeydef "
> "FROM pg_class c "
> "LEFT JOIN pg_depend d ON "
> "(c.relkind = '%c' AND "
> I think if you test it you'll find this breaks dumps from older server
> versions. You've got to add a new version of the SQL query for 10+,
> and then update the 9.6, 9.5, 9.4, 9.3, 9.1-9.2, 9.0, 8.4, 8.2-8.3,
> 8.0-8.1, 7.3-7.4, 7.2, 7.1, and pre-7.1 queries to return a dummy NULL
> value for that column. Alternatively, you can add the new version for
> 10.0, leave the older queries alone, and then adjust the code further
> down to cope with i_partkeydef == -1 (see i_checkoption for an
> example).

You're quite right. Fixed using this last method.

> gettext_noop("table"),
> + gettext_noop("table"),
> Add a comment like /* partitioned table */ on the same line.


Attached revised patches. Also, includes a fix for an issue reported by
Rajkumar Raghuwanshi [1] which turned out to be a bug in one of the later
patches. I will now move on to addressing the comments on patch 0003.

Thanks a lot for the review!



Attachment Content-Type Size
0001-Catalog-and-DDL-for-partitioned-tables-8.patch text/x-diff 113.3 KB
0002-psql-and-pg_dump-support-for-partitioned-tables-8.patch text/x-diff 26.2 KB
0003-Catalog-and-DDL-for-partitions-8.patch text/x-diff 204.9 KB
0004-psql-and-pg_dump-support-for-partitions-8.patch text/x-diff 20.6 KB
0005-Refactor-optimizer-s-inheritance-set-expansion-code-8.patch text/x-diff 16.2 KB
0006-Teach-a-few-places-to-use-partition-check-quals-8.patch text/x-diff 30.2 KB
0007-Introduce-a-PartitionTreeNode-data-structure-8.patch text/x-diff 8.1 KB
0008-Tuple-routing-for-partitioned-tables-8.patch text/x-diff 43.6 KB
0009-Update-DDL-Partitioning-chapter-to-reflect-new-devel-8.patch text/x-diff 24.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Dunstan 2016-10-06 07:49:39 Re: Feature suggestion: ON CONFLICT DO NOTHING RETURNING which returns existing row data
Previous Message Heikki Linnakangas 2016-10-06 07:00:06 Re: memory leak in e94568ecc10 (pre-reading in external sort)