Re: SSI-related code drift between index_getnext() and heap_hot_search_buffer()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI-related code drift between index_getnext() and heap_hot_search_buffer()
Date: 2011-05-14 14:18:42
Message-ID: BANLkTik8KxxjJ1KW-pO+WWBdTEAT+80ArQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 13, 2011 at 12:10 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> FWIW, so far what I know is that it will take an example something
> like the one shown here:
>
> http://archives.postgresql.org/pgsql-hackers/2011-02/msg00325.php
>
> with the further requirements that the update in T3 must not be a
> HOT update, T1 would still need to acquire a snapshot before T2
> committed while moving its current select down past the commit of
> T3, and that select would need to be modified so that it would scan
> the visible tuple and then stop (e.g., because of a LIMIT) before
> reaching the tuple which represents the next version of the row.

I think I see another problem here. Just before returning each tuple,
index_getnext() records in the IndexScanDesc the offset number of the
next tuple in the HOT chain, and the XMAX of the tuple being returned.
On the next call, it will go on to examine that TID checking, among
other things, whether the XMIN of the tuple at that location matches
the previously stored XMAX. But no buffer content locks is held
across calls. So consider a HOT chain A -> B. After returning A, the
IndexScanDesc will consider that we should next look at B. Now B
rolls back, and a new transaction updates A, so we now have A -> C.
(I believe this is possible.) When the next call to index_getnext()
occurs, it'll look at B and consider that it's reached the end of the
HOT chain - but in reality it has not, because it has never looked at
C.

Now, prior to SSI, I believe this did not matter, because the only
time we traversed the entire HOT chain rather than stopping at the
first visible tuple was when we were using a non-MVCC snapshot.
According to Heikki's submission notes for the patch I was trying to
rebase, the only time that happens is during CLUSTER, at which point
we have an AccessExclusiveLock on the table. But SSI wants to
traverse the whole HOT chain even when using an MVCC snapshot, so now
we (maybe) have a problem.

I think I have an inkling of how to plug this, but first I have to go
buy groceries.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-05-14 15:14:27 Re: Reducing overhead of frequent table locks
Previous Message Robert Haas 2011-05-14 14:01:01 Re: Reducing overhead of frequent table locks