Re: Removing PD_ALL_VISIBLE

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing PD_ALL_VISIBLE
Date: 2012-11-26 21:57:09
Message-ID: 1353967029.13162.40.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2012-11-26 at 16:10 -0500, Tom Lane wrote:
> There's still the issue of whether the IOS code is safe in machines with
> weak memory ordering. I think that it probably is safe, but the
> argument for it in the current code comment is wrong; most likely, a
> correct argument has to depend on read/write barriers associated with
> taking snapshots. I think what you ought to do is work through that,
> fix the existing comment, and then see whether the same argument works
> for what you want to do.

As a part of the patch, I did change the comment, and here's what I came
up with:

* Note on Memory Ordering Effects: visibilitymap_test does not lock
* the visibility map buffer, and therefore the result we read here
* could be slightly stale. However, it can't be stale enough to
* matter.
*
* We need to detect clearing a VM bit due to an insert right away,
* because the tuple is present in the index page but not visible. The
* reading of the TID by this scan (using a shared lock on the index
* buffer) is serialized with the insert of the TID into the index
* (using an exclusive lock on the index buffer). Because the VM bit is
* cleared before updating the index, and locking/unlocking of the
* index page acts as a full memory barrier, we are sure to see the
* cleared bit if we see a recently-inserted TID.
*
* Deletes do not update the index page (only VACUUM will clear out the
* TID), so the clearing of the VM bit by a delete is not serialized
* with this test below, and we may see a value that is significantly
* stale. However, we don't care about the delete right away, because
* the tuple is still visible until the deleting transaction commits or
* the statement ends (if it's our transaction). In either case, the
* lock on the VM buffer will have been released (acting as a write
* barrier) after clearing the bit. And for us to have a snapshot that
* includes the deleting transaction (making the tuple invisible), we
* must have acquired ProcArrayLock after that time, acting as a read
* barrier.
*
* It's worth going through this complexity to avoid needing to lock
* the VM buffer, which could cause significant contention.

And I updated the comment in visibilitymap.c as well (reformatted for
this email):

"To test a bit in the visibility map, most callers should have a pin on
the VM buffer, and at least a shared lock on the data buffer. Any
process that clears the VM bit must have an exclusive lock on the data
buffer, so that will serialize access to the appropriate bit. Because
lock acquisition and release are full memory barriers, then there is no
danger of seeing the state of the bit before it was last cleared.
Callers that don't have the data buffer yet, such as an index only scan
or a VACUUM that is skipping pages, must handle the concurrency as
appropriate."

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-11-26 22:26:42 Re: Further pg_upgrade analysis for many tables
Previous Message Karl O. Pinc 2012-11-26 21:51:05 Re: Suggestion for --truncate-tables to pg_restore