Tuple visibility

From: Jim Nasby <jim(at)nasby(dot)net>
To: austinpug(at)postgresql(dot)org
Subject: Tuple visibility
Date: 2010-12-11 15:12:31
Message-ID: 56C6A08A-6D22-4154-81B0-6BFC9A52FB73@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: austinpug

After Wednesday's discussion I thought people might find this interesting (excerpted from backend/executor/README):

EvalPlanQual (READ COMMITTED Update Checking)
---------------------------------------------

For simple SELECTs, the executor need only pay attention to tuples that are
valid according to the snapshot seen by the current transaction (ie, they
were inserted by a previously committed transaction, and not deleted by any
previously committed transaction). However, for UPDATE and DELETE it is not
cool to modify or delete a tuple that's been modified by an open or
concurrently-committed transaction. If we are running in SERIALIZABLE
isolation level then we just raise an error when this condition is seen to
occur. In READ COMMITTED isolation level, we must work a lot harder.

The basic idea in READ COMMITTED mode is to take the modified tuple
committed by the concurrent transaction (after waiting for it to commit,
if need be) and re-evaluate the query qualifications to see if it would
still meet the quals. If so, we regenerate the updated tuple (if we are
doing an UPDATE) from the modified tuple, and finally update/delete the
modified tuple. SELECT FOR UPDATE/SHARE behaves similarly, except that its
action is just to lock the modified tuple and return results based on that
version of the tuple.

To implement this checking, we actually re-run the query from scratch for
each modified tuple (or set of tuples, for SELECT FOR UPDATE), with the
relation scan nodes tweaked to return only the current tuples --- either
the original ones, or the updated (and now locked) versions of the modified
tuple(s). If this query returns a tuple, then the modified tuple(s) pass
the quals (and the query output is the suitably modified update tuple, if
we're doing UPDATE). If no tuple is returned, then the modified tuple(s)
fail the quals, so we ignore the current result tuple and continue the
original query.

In UPDATE/DELETE, only the target relation needs to be handled this way.
In SELECT FOR UPDATE, there may be multiple relations flagged FOR UPDATE,
so we obtain lock on the current tuple version in each such relation before
executing the recheck.

It is also possible that there are relations in the query that are not
to be locked (they are neither the UPDATE/DELETE target nor specified to
be locked in SELECT FOR UPDATE/SHARE). When re-running the test query
we want to use the same rows from these relations that were joined to
the locked rows. For ordinary relations this can be implemented relatively
cheaply by including the row TID in the join outputs and re-fetching that
TID. (The re-fetch is expensive, but we're trying to optimize the normal
case where no re-test is needed.) We have also to consider non-table
relations, such as a ValuesScan or FunctionScan. For these, since there
is no equivalent of TID, the only practical solution seems to be to include
the entire row value in the join output row.

We disallow set-returning functions in the targetlist of SELECT FOR UPDATE,
so as to ensure that at most one tuple can be returned for any particular
set of scan tuples. Otherwise we'd get duplicates due to the original
query returning the same set of scan tuples multiple times. (Note: there
is no explicit prohibition on SRFs in UPDATE, but the net effect will be
that only the first result row of an SRF counts, because all subsequent
rows will result in attempts to re-update an already updated target row.
This is historical behavior and seems not worth changing.)
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

Browse austinpug by date

  From Date Subject
Next Message Nasby, Jim 2011-01-07 23:25:09 Meeting this Wednesday
Previous Message Jon Erdman 2010-12-08 21:27:01 Re: Meeting tomorrow!