Quick Links

Re: Referential Integrity In PostgreSQL

From:	wieck(at)debis(dot)com (Jan Wieck)
To:	massimo(dot)lambertini(at)everex(dot)it (Massimo)
Cc:	pgsql-hackers(at)postgreSQL(dot)org (PostgreSQL HACKERS)
Subject:	Re: Referential Integrity In PostgreSQL
Date:	1999-09-20 09:59:05
Message-ID:	m11T0Dx-0003kLC@orion.SAPserv.Hamburg.dsh.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>
> Hi , Jan
>
> my name is Max .

Hi Max,

>
> I have contributed to SPI interface ,
> that with external Trigger try to make
> a referential integrity.
>
> If I can Help , in something ,
> I'm here .
>

You're welcome.

I've CC'd the hackers list because we might get some ideas
from there too (and to surface once in a while - Bruce
already missed me).

Currently I'm very busy for serious work so I don't find
enough spare time to start on such a big change to
PostgreSQL. But I'd like to give you an overview of what I
have in mind so far so you can decide if you're able to help.

Referential integrity (RI) is based on constraints defined in
the schema of a database. There are some different types of
constraints:

1. Uniqueness constraints.

2. Foreign key constraints that ensure that a key value used
in an attribute exists in another relation. One
constraint must ensure you're unable to INSERT/UPDATE to
a value that doesn't exist, another one must prevent
DELETE on a referenced key item or that it is changed
during UPDATE.

3. Cascading deletes that let rows referring to a key follow
on DELETE silently.

Even if not defined in the standard (AFAIK) there could be
others like letting references automatically follow on UPDATE
to a key value.

All constraints can be enabled and/or default to be deferred.
That means, that the RI checks aren't performed when they are
triggerd. Instead, they're checked at transaction end or if
explicitly invoked by some special statement. This is really
important because someone must be able to setup cyclic RI
checks that could never be satisfied if the checks would be
performed immediately. The major problem on this is the
amount of data affected until the checks must be performed.
The number of statements executed, that trigger such deferred
constraints, shouldn't be limited. And one single
INSERT/UPDATE/DELETE could affect thousands of rows.

Due to these problems I thought, it might not be such a good
idea to remember CTID's or the like to get back OLD/NEW rows
at the time the constraints are checked. Instead I planned to
misuse the rule system for it. Unfortunately, the rule system
has damned tricky problems itself when it comes to having-,
distinct and other clauses and extremely on aggregates and
subselects. These problems would have to get fixed first. So
it's a solution that cannot be implemented right now.

Fallback to CTID remembering though. There are problems too
:-(. Let's enhance the trigger mechanism with a deferred
feature. First this requires two additional bool attributes
in the pg_trigger relation that tell if this trigger is
deferrable and if it is deferred by default. While at it we
should add another bool that tells if the trigger is enabled
(ALTER TRIGGER {ENABLE|DISABLE} trigger).

Second we need an internal list of triggers, that are
currently DEFINED AS DEFERRED. Either because they default to
it, or the user explicitly asked to deferr it.

Third we need an internal list of triggers that must be
invoked later because at the time an event occured where they
should have been triggered, they appeared in the other list
and their execution is delayed until transaction end or
explicit execution. This list must remember the OID of the
trigger to invoke (to identify the procedure and the
arguments), the relation that caused the trigger and the
CTID's of the OLD and NEW row.

That last list could grow extremely! Think of a trigger
that's executing commands over SPI which in turn activate
deferred triggers. Since the order of trigger execution is
very important for RI, I can't see any chance to
simplify/condense this information. Thus it is 16 bytes at
least per deferred trigger call (2 OID's plus 2 CTID's). I
think one or more temp files would fit best for this.

A last tricky point is if one of a bunch of deferred triggers
is explicitly called for execution. At this time, the entries
for it in the temp file(s) must get processed and marked
executed (maybe by overwriting the triggers OID with the
invalid OID) while other trigger events still have to get
recorded.

Needless to say that reading thousands of those entries just
to find a few isn't good on performance. But better have this
special case slow that dealing with hundreds of temp files or
other overhead slowing down the usual case where ALL deferred
triggers get called at transaction end.

Trigger invocation is simple now - fetch the OLD and NEW rows
by CTID and execute the trigger as done by the trigger
manager. Oh - well - vacuum shouldn't touch relations where
deferred triggers are outstanding. Might require some
special lock entry - Vadim?

Did I miss something?

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck(at)debis(dot)com (Jan Wieck) #

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	1999-09-20 13:21:04	Re: [HACKERS] why do shmem attach?
Previous Message	Theo Kramer	1999-09-20 07:28:40	Re: [HACKERS] Re: HISTORY for 6.5.2