Re: invalid tid errors in latest 7.3.4 stable.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: archeron(at)wavefire(dot)com, Chris Kratz <chris(dot)kratz(at)vistashare(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: invalid tid errors in latest 7.3.4 stable.
Date: 2003-09-26 14:20:50
Message-ID: 15159.1064586050@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> Okay, I'll work out some extension of the APIs to let us propagate the
>> snapshot request down through SPI and into the Executor, rather than
>> using a global variable for it. (Unless someone has a better idea...)

Just when you thought it was safe to go back in the water ...

Chris Kratz sent me the attached example, which fails in 7.3 and (still)
fails in CVS HEAD too.

It appears that the failure mode goes like this: "DELETE FROM activity"
cascades via ON DELETE CASCADE to a delete in qry_column_list. The RI
trigger's delete query fires the RULE, and so must execute "UPDATE
report_objects". The compilation of report_objects' plpgsql trigger
advances the CommandCounter, creating the potential for Halloween
problems when SnapshotNow is used to fetch values. In particular the
UPDATE sees its own output rows as valid source rows.

As far as the "DELETE FROM qry_column_list" goes, I think the solution
is that fetching rows can't use pure SnapshotNow after all. What we
need is to create a fresh QuerySnapshot that shows all transactions
committed-to-date as committed, and saves the current CommandCounter as
the criterion for locally created rows. Unlike SnapshotNow, this would
mean that transactions committed just after we take the new snapshot
would not be seen as committed. This should be okay AFAICS --- once we
reach the RI triggers, all transactions we need to worry about should be
committed. (If not, surely there's a race condition anyway.) Also note
that an RI query would *not* see the effects of actions it indirectly
triggers. This should be okay, because if they do anything that
requires RI validation, they should cause additional RI trigger firings
to be queued for attention later.

But Chris' example raises a host of other questions in my mind. Should
we apply this forcibly-updated QuerySnapshot to actions that are
indirectly triggered by RI queries? In CVS tip, SnapshotNow rules are
in fact used for the UPDATE that's generated by the RULE, because it's
part of the generated plan for the DELETE. But any queries executed
inside triggers fired as a result of all this would use the pre-existing
QuerySnapshot, and hence could see a worldview completely inconsistent
with the rows they are being fired for :-(. It's worse in 7.3, because
the first trigger exit would revert ReferentialIntegritySnapshotOverride
to false, meaning you wouldn't even be using the same snapshot rules
throughout the UPDATE/DELETE :-( :-(

I am inclined to think now that the right solution is for the RI
triggers to update the global QuerySnapshot to current time when they
start, and then revert it to what it had been before exiting. (And that
code had better be in the RI triggers themselves, *not* in the generic
trigger-calling code.) This would ensure that actions taken indirectly
as a result of RI behavior would see a consistent worldview.

The main argument I can see against this is that it would be a really
big wart on the behavior of SERIALIZABLE transactions. Instead of
saying "in a SERIALIZABLE transaction, you only see the effects of
transactions committed before your transaction started", we'd have to
add a footnote "except in actions taken as a result of RI-generated
queries", which sure complicates matters from a logical point of view.
(In READ COMMITTED mode, on the other hand, it's no big deal; we are
effectively just decreeing that a new command starts before the RI
triggers run.)

Comments? Anyone have a better idea?

Anyway, on to Chris' example. Load the attached script into a database
that has plpgsql already created, and then do
DELETE FROM Activity WHERE ActivityID = 16739;
You'll get
ERROR: attempted to mark4update invisible tuple
(or the equivalent 7.3 message). This is reproducible so long as you
start a fresh session each time you attempt the DELETE. If you try the
DELETE again in the same session, it will succeed, because the trigger
function is already compiled and so no CommandCounterIncrement occurs at
the critical instant. (It might be possible to make the behavior stable
by adding some non-SELECT query inside the trigger function to force
a CommandCounterIncrement to occur anyway. I haven't tried though.)

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-09-26 14:32:33 Re: feature request: show pgsql version when running initdb
Previous Message scott.marlowe 2003-09-26 14:18:29 Re: feature request: show pgsql version when running initdb