BEFORE ROW triggers for partitioned tables

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: BEFORE ROW triggers for partitioned tables
Date: 2020-02-27 16:51:58
Message-ID: 20200227165158.GA2071@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Enabling BEFORE triggers FOR EACH ROW in partitioned tables is very easy
-- just remove the check against them. With that, they work fine.

The main problem is that the executor is not prepared to re-route the
tuple if the user decides to change the partitioning columns (so you get
the error that the partitioning constraint is violated by the partition,
which makes no sense if you're inserting in the top-level partitioned
table). There are several views about that:

1. Just let it be. If the user does something silly, it's their problem
if they get an ugly error message.

2. If the partition keys are changed, raise an error. The trigger is
allowed to change everything but those columns. Then there's no
conflict, and it allows desirable use cases.

3. Allow the partition keys to change, as long as the tuple ends up in
the same partition. This is the same as (1) except the error message is
nicer.

The attached patch implements (2). The cases that are allowed by (3)
are a strict superset of those allowed by (2), so if we decide to allow
it in the future, it is possible without breaking anything that works
after implementing (2).

The implementation harnesses the newly added pg_trigger.tgparentid
column; if that is set to a non-zero value, then we search up the
partitioning hierarchy for each partitioning key column, and verify the
values are bitwise equal, up to the "root". Notes:

* We must check all levels, not just the one immediately above, because
the routing might involve crawling down several levels, and any of those
might become invalidated if the trigger changes values.

* The "root" is not necessarily the root partitioned table, but instead
it's the table that was named in the command. Because of this, we don't
need to acquire locks on the tables, since the executor already has the
tables open and locked (thus they cannot be modified by concurrent
commands).

* I find it a little odd that the leaf partition doesn't have any intel
on what its partitioning columns are. I guess they haven't been needed
thus far, and it seems inappropriate for this admittedly very small
feature to add such a burden on the rest of the system.

* The new function I added, ReportTriggerPartkeyChange(), contains one
serious bug (namely: it doesn't map attribute numbers properly if
partitions are differently defined). Also, it has a performance issue,
namely that we do heap_getattr() potentially repeatedly -- maybe it'd be
better to "slotify" the tuple prior to doing the checks. Another
possible controversial point is that its location in commands/trigger.c
isn't great. (Frankly, I don't understand why the executor trigger
firing functions are in commands/ at all.)

Thoughts?

--
Álvaro Herrera

Attachment Content-Type Size
0001-Enable-BEFORE-row-level-triggers-for-partitioned-tab.patch text/x-diff 13.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-02-27 17:33:42 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Previous Message Tom Lane 2020-02-27 16:32:21 Re: pg_trigger.tgparentid