Quick Links

A Case For Inlining Immediate Referential Integrity Checks

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	A Case For Inlining Immediate Referential Integrity Checks
Date:	2021-03-14 19:49:27
Message-ID:	CADkLM=eZJddpx6RDop-oCrQ+J9R-wfbf6MoLxUUGjbpwTkoUXQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

A Case For Inlining Immediate Referential Integrity Checks
----------------------------------------------------------

The following is an overview of how Postgres currently implemented
referential integrity, the some problems with that architecture, attempted
solutions for those problems, and a suggstion of another possible solution.

Notes On Notation and Referential Integrity In General
------------------------------------------------------

All referential integrity is ultimately of this form:

R(X) => T(Y)

Where one referencing table R has a set of columns X That references a set
of columns Y which comprise a unique constraint on a target table T. Note
that the Y-set of columns is usually the primary key of T, but does not
have to be.

The basic referential integrity checks fall into two basic categories,
Insert and Delete, which can be checked Immediately following the
statement, or can be Deferred to the end of the transaction.

The Insert check is fairly straightforward. Any insert to R, or update of R
that modifies [1] any column in X, is checked to see if all of the X
columns are NOT NULL, and if so, a lookup is done on T to find a matching
row tuple of Y. If none is found, then an error is raised.

The Update check is more complicated, as it covers any UPDATE operation
that modifies [1] any column in Y, where all of the values of Y are NOT
NUL, as well as DELETE operation where all of the columns of Y are NOT
NULL. For any Update check, the table R is scanned for any matching X
tuples matching Y in the previous, and for any matches found, an action is
taken. That action can be to fail the operation (NO ACTION, RESTRICT),
update the X values to fixed values (SET NULL, SET DEFAULT), or to delete
those rows in R (CASCADE).

Current Implementation
----------------------

Currently, these operations are handled via per-row triggers. In our
general case, one trigger is placed on R for INSERT operations, and one
trigger is placed on T for DELETE operations, and an additional trigger is
placed on T for UPDATE operations that affect any column of Y.

These Insert trigger functions invoke the C function RI_FKey_check() [2].
The trigger is fired unconditionally, and the trigger itself determines if
there is a referential integrity constraint to be made or not. Ultimately
this trigger invokes an SPI query of the form SELECT 1 FROM <T> WHERE (<X =
Y>) FOR KEY SHARE. This query is generally quite straightforward to the
planner, as it becomes either a scan of a single unique index, or a
partition search followed by a scan of a single unique index. The operation
succeeds if a row is found, and fails if it does not.

cascade: DELETE FROM <R> WHERE <X = Y>
restrict/noaction: SELECT 1 FROM <R> WHERE <X = Y> FOR KEY SHARE
setnull: UPDATE <R> SET x1 = NULL, ... WHERE <X = Y>
setdefault: UPDATE <R> SET x1 = DEFAULT, ... WHERE <X = Y>

These triggers are either executed at statement time (Immediate) or are
queued for execution as a part of the transaction commit (Deferred).

Problems With The Current Implementation
----------------------------------------

The main problems with this architecture come down to visiblity and
performance.

The foremost problem with this implementation is that these extra queries
are not visible to the end user in any way. It is possible to infer that
the functions executed by looking at the constraint defnitions and
comparing pg_stat_user_tables or pg_stat_user_indexes before and after the
operation, but in general the time spent in these functions accrues to the
DML statement (Immediate) or COMMIT statement (Deferred) without any insght
into what took place. This is especially vexing in situations where an
operation as simple as "DELETE FROM highly_referenced_table WHERE id = 1"
hits the primary key index, but takes several seconds to run.

The performance of Insert operations is generally not too bad, in that
query boils down to an Index Scan for a single row. The problem, however,
is that this query must be executed for every row inserted. The query
itself is only planned once, and that query plan is cached for later
re-use. That removes some of the query overhead, but also incurs a growing
cache of plans which can create memory pressure if the number of foreign
keys is large, and indeed this has become a problem for at least one
customer [4]. Some profiling of the RI check indicated that about half of
the time of the insert was spent in SPI functions that could be bypassed if
the C function called index_beginscan and index_rescan directly [5]. And
these indications bore out when Amit Langote wrote a patch [6] which finds
the designanted index from the constraint (with some drilling through
partitions if need be) and then invokes the scan functions. This method
showed about a halving of the time involved, while also avoiding the memory
pressure from many cached plans [7].

The performance of Delete operations is far less certain, and the potential
performance impact is far greater. There are four main reasons for this.
First, there is no guarantee of a suitable index on the R table let alone
an optimal index, so a Sequential Scan is possible. Second, the operation
may match multiple rows in the R table, so the scan must exhaust the whole
table/index of R. Third, this already sub-optimal operation must be
performed for every row affected by the Delete operation, with no ability
to coordinate between row triggers which means that in a pathological case,
a large table is sequential scanned once per row deleted. Lastly, the
cascade, setnull and setdefault variants have the potential to fire more
triggers.

Attempts at Statement Level Triggers
------------------------------------

Back in 2016, Kevin Grittner made a mock up of handling RI delete checks
with statement-level triggers [8] and got a 98% reduction in execution
time. The source of the benefit was not hard to see: doing 1 query
filtering for N values is almost always faster than N queries filtering for
1 each. It especially helps the most pathological case where a Sequential
Scan is done in the row-trigger case, now a hash join to the transition
table is possible.

It was with that thinking that I made my own attempt at re-implementing RI
with statement-level triggers [9]. This effort was incomplete, and raised
questions from Alvaro Herrera [10] and Kevin Grittner [11]. Also, the
implementation was only seeing about a 25% improvement instead of the 98%
that was hoped. Antonin Houska found some bugs in the patch [12] and
suggested keeping row level triggers. The chief source of baggage seemed to
be that the transition tables contained the entire before/after rows, not
just the columns needed to process the trigger, and that level of memory
consumption was quite significant.

Some time later, Antonin followed through with his idea for how to improve
the patch [13], which sparked a lively discussion where it was observed
that this discussion had happened before, in other forms, at least as far
back as 2013 [14]. The chief challenges being how to optimize a many-row
update without penalizing a single row update.

All of the discussion, however, was about architecture and performance of
triggers, without any mention of improving the visibility of RI checks to
the user.

Current State Of Affairs
------------------------

Amit's patch to remove all SPI from Insert triggers [6] appears to work,
and is about as minimally invasive as possible under the circumstances, and
I think it is important that it be included for v14. However, concerns have
been raised about the patch bypassing permission checks, how it would
handle foreign tables, alternate access methods, etc.

Even if all issues are resolved with Amit's patch, it does nothing for
Updates, and nothing for visiblity. As was mentioned before, the chief
problem for Update performance is that statement-level triggers are too
much overhead for single-row updates, and row level triggers need extra
overhead to see if they can find common cause with other trigger firings.

It has occurred to me that the solution to both of these issues is to,
where possible, roll the trigger operation into the query itself, and in
cases where that isn't possible, at least indicate that a trigger could be
fired.

The Proposed Changes
--------------------

The changes I'm proposing boil down to the following:

1. Add informational nodes to query plans to indicate that a certain named
trigger would be fired, whether it is a row or statement trigger, and in
the case of per-row triggers, how many are expected to be fired.
2. To the extent possible, accrue time/cost expended on triggers to that
node.
3. Make the planner capable of seeing these triggers, and where possible,
bring the logic of the RI check inside the query itself.

Adding Trigger Nodes to A Query Plan
------------------------------------

When a query is being planned that modifies a table, the planner would
inspect the triggers affected on that table, and will create a node for
each named trigger that would be fired.

1. Disabled triggers are ignored outright.
2. User-defined triggers and Deferred RI triggers will simply be named
along with an estimate of the number of firings that will occur.
3. RI Insert triggers that are in immediate mode will be inlined as a node
ReferentialIntegrityInsert
3. RI Update/Delete triggers of type ON DELETE RESTRICT and ON DELETE NO
ACTION that are in immediate mode will be inlined as a node
RerefentialIntegrityDeleteCheck
4. RI Update/Delete triggers of type CASCADE will be inlined as a node
RerefentialIntegrityDeleteCascade
5. RI Update/Delete triggers of type SET NULL and SET DEFAULT will be
inlined as a node RerefentialIntegrityUpdateSet.

Note that nodes RerefentialIntegrityDeleteCascade and
RerefentialIntegrityUpdateSet will themselves modify tables which can in
turn have RI constraints that would create more
RerefentialIntegrityDeleteCascade and RerefentialIntegrityUpdateSet nodes.
In theory, this could continue infinitely. Praactically speaking, such
situations either have a DEFERRED constraint somewhere in the loop, or one
of the references starts off initially NULL. Still, the planner can't see
that, so it makes sense to put a depth-limit on such things, and any
UPDATEs/DELETEs that cascade beyond that limit are simply not inlined, and
are queued as user-defined immediate triggers are.

ReferentialIntegrityInsert Node
-------------------------------
This is a node that does a lookup of every tuple in the input set to the
unique index(normal or partitioned) on the referencing table. The node
would have the option of first doing a distinct on the set of inputs, or
instead might choose to cache the values already looked up to avoid
duplicate index traversals, in which case the operation would more closely
resemble a LEFT OUTER JOIN where, instead of returning a null row on a
miss, the query would instead fail on an RI violation.

RerefentialIntegrityDeleteCheck Node
------------------------------------
In many ways, this node is the opposite of ReferentialIntegrityInsert, in
the sense that it fails if there IS a match found rather than if there was
not. The power of this node lies in the fact that the planner can see how
many tuples will be input, and can decide on a nested loop traversal, or a
hash join, or a merge join if amenable indexes are present.

RerefentialIntegrityDeleteCascade node
------------------------------------
This node amounts to DELETE FROM referencing_table WHERE id IN (
set_of_input_tuples ) with no returning clause and no need for one. The
node cannot itself filter any rows, nor can it raise any errors, but the RI
constraints of referencing_table might raise errors.

RerefentialIntegrityUpdateSet node
------------------------------------
This node amounts to UPDATE referencing_table SET col = NULL|DEFAULT WHERE
id IN ( set_of_input_tuples ) with no returning clause and no need for one.
The node cannot itself filter any rows, nor can it raise any errors, but
the RI constraints of referencing_table might raise errors.

Advantages To This Approach
---------------------------

1. The visibility of triggers is given, regardless of which triggers are
eligible for inlining.
2. All triggers have a fall-back case of operating exactly as they do now.
3. Inlining could be enabled/disabled by a session setting or other GUC.
4. The planner itself has vibility into the number of rows affected, and
therefore can choose single-row vs set operations as appropriate.

What Might This Look Like?
--------------------------

Say there's a fact table class_registration with foreign keys to the
dimension tables class, student, semester, you'd see something like this

EXPLAIN INSERT INTO class_registration VALUES (...);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on class_regisration (cost=0.00..0.01 rows=1 width=40)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> RerefentialIntegrityInsert: class_registration_class_id_fkey
-> IndexOnlyScan class_pk (...)
-> RerefentialIntegrityInsert: class_registration_student_id_fkey
-> IndexOnlyScan student_uq (...)
-> RerefentialIntegrityInsert: class_registration_semester_id_fkey
-> IndexOnlyScan semester_pk (...)

Or in the case of deferred constraints

EXPLAIN INSERT INTO class_registration VALUES (...);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on class_regisration (cost=0.00..0.01 rows=1 width=40)
-> Result (cost=0.00..0.01 rows=1 width=4)
-> ForeignKey Check: class_registration_class_id_fkey
(RI_FKey_check_ins)
-> Row Deferred (Execute 1 Time)
-> ForeignKey Check: class_registration_student_id_fkey
(RI_FKey_check_ins)
-> Row Deferred (Execute 1 Time)
-> ForeignKey Check: class_registration_semester_id_fkey
(RI_FKey_check_ins)
-> Row Deferred (Execute 1 Time)

Is there anything actionable for user? Not there, but now they know the
extra lifting they were asking the database to perform.

The actual tune-able would be on a DELETEs and UPDATEs

EXPLAIN DELETE FROM class WHERE class_id <= 4;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete on class (cost=0.00..0.04 rows=3 width=60)
-> Index Only Scan on class_pk (...)
-> rerefentialIntegrityDeleteCascade: class_class_registration_fkey
-> Hash Cond: (r1.class_id = class.id)
-> Hash (...)
-> Seq Scan on class_registration c1 (...)
-> Hash (...)
-> Materialize Affected Tuples

The reader could see that an index was missing, and work to fix it. The
planner, for it's part, saw that a Seq Scan would be needed, but rather
than Scan once per row as per-row triggers would have done, it does the
scan once, hashes it, and then does a comparison against a similar hash of
the rows deleted.

It's possible that the plan would also view the DELETE as a CTE which is
then fed as input to progressive layers of RI checks.

Conclusion
----------

I think there are improvements to be made in RI checks in terms of
performance, visibility, and tunability. It is my hope that this sparks
discussion that leads to better performance and visibility of Referential
Integrity.

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Justin Pryzby	2021-03-14 20:06:18	Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Previous Message	Pavel Stehule	2021-03-14 19:41:15	Re: pl/pgsql feature request: shorthand for argument and local variable references