On 2010-12-01 16:05, Tom Lane wrote:
> Perhaps I should have said "possibly workable proposal". What you wrote
> doesn't even begin to cover the interesting part of the problem, namely
> how to ensure uniqueness is preserved in the face of concurrent
The usual page locking done by nbtree doesn't work in this case, since a
conflict can be two inserts into two different indexes. This means that
one way or another, some extra kind of lock is needed. The question is
what current lock mechanism is suitable for this purpose, the answer to
that may depend on the way conflicts are checked for. Checking for
conflicts can be done at row insert time or commit time.
With 'inheritance chain' I mean any set of more than one relation that
is the transitive closure of the inheritance parent and child relation
on a given relation.
- Check at row insert time:
-- before insert of row into an inheritance chain, suppose with key value 10
-- lock X exclusively (this is to be a fined grained lock, X could be a
name derived from the inserted key value 10)
-- query inheritance chain for existing values with SnapshotNow
-- on existing value, release 'X locks' taken by current backend and error
-- else normal processing continues (actual insert etc..)
-- after commit, release 'X locks' taken by current backend
If another transaction B wants to insert the same key value 10, it will
block on the lock taken by A, which A releases after commit, then B
continues and when it queries the inheritance chain, it sees the
committed record of A and will then report an error.
The drawback here is lock proliferation: for each row a lock seems unwanted.
- Check at commit time
-- inserts row into inheritance chain
-- record row and tuple for later check
-- at commit time, if any check was recorded
--- for every inheritance chain with inserts, ordered by lowest relation oid
---- lock Y exclusively (where Y is a name derived from the inheritance
chain, e.g. lowest relation oid)
---- for every inserted value, query for duplicate key values in
inheritance chain with SnapshotNow
---- on error, release Y and abort transaction (serialization error)
--- after commit / heap tuples marked committed, release all Y locks.
May insert duplicate values into the inheritance chain, but at commit
time will either get a lock or block on lock. It can continue after A
releases locks, and then it will read the changes committed by B.
Though this solution has fewer locks as the per-row case, a deadlock can
occur if the inheritance chains are not locked in order. This is avoided
by ordering the chains on the lowest oid of its relations. Perhaps the
biggest drawback is that a serialization error must be thrown at commit,
even when the user did not set the transaction isolation level to
SERIALIZABLE and that might violate POLA.
> (My current feelings about this are that a general-purpose solution
> would probably cost more than it's worth. What people really care
> about is FK to a partitioned table, which is a structure in which
> we don't have to solve the general problem: if we know that the
> partitioning prevents the same key from appearing in multiple
> partitions, then we only have to look into one partition. So this
> is just another item that's pending introduction of real partitioning
While this is true for the partitioning use case, it will never be for
'true' inheritance use cases, and therefore a real partitioning
structure doesn't help the inheritance use cases. (Such as ours that is
a implementation of HL7's reference implementation model.
http://www.hl7.org/v3ballot/html/infrastructure/rim/rim.html) A lot can
be said about object model implementations in relational databases. To
keep it short: Codd stressed the value of the relational model in it's
ACM Turing award lecture: a practical foundation for productivity. It is
not productive, that users must implement foreign key checking in
user-space, instead of using a database primitive.
In response to
pgsql-hackers by date
|Next:||From: Yeb Havinga||Date: 2010-12-02 10:30:59|
|Subject: Re: FK's to refer to rows in inheritance child|
|Previous:||From: aaliya zarrin||Date: 2010-12-02 09:55:04|
|Subject: Re: Hi- How frequently Postgres Poll for trigger file|