Re: Reducing ClogControlLock contention

From: Andres Freund <andres(at)anarazel(dot)de>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing ClogControlLock contention
Date: 2015-08-22 14:14:48
Message-ID: 20150822141448.GH8552@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Amit pinged me about my opinion of this patch. I don't really have
time/energy for an in-depth review right now, but since I'd looked
enough to see some troublesome points, I thought I'd write those.

On 2015-06-30 08:02:25 +0100, Simon Riggs wrote:
> Proposal for improving this is to acquire the ClogControlLock in Shared
> mode, if possible.

I find that rather scary. That requires for all read and write paths in
clog.c and slru.c to only ever read memory locations once. Otherwise
those reads may not get the same results. That'd mean we'd need to add
indirections via volatile to all reads/writes. It also requires that we
never read in 4 byte quantities.

> This is safe because people checking visibility of an xid must always run
> TransactionIdIsInProgress() first to avoid race conditions, which will
> always return true for the transaction we are currently committing. As a
> result, we never get concurrent access to the same bits in clog, which
> would require a barrier.

I don't think that's really sufficient. There's a bunch of callers do
lookups without such checks, e.g. in heapam.c. It might be safe due to
other circumstances, but at the very least this is a significant but
sublte API revision.

> Two concurrent writers might access the same word concurrently, so we
> protect against that with a new CommitLock. We could partition that by
> pageno also, if needed.

To me it seems better to make this more integrated with slru.c. Change
the locking so that the control lock (relevant for page mappings et al)
is different from the locks for reading/writing data.

> * If we're doing an async commit (ie, lsn is valid), then we must wait
> * for any active write on the page slot to complete. Otherwise our
> * update could reach disk in that write, which will not do since we
> @@ -273,7 +290,9 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
> * write-busy, since we don't care if the update reaches disk sooner than
> * we think.
> */
> - slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
> + if (!InRecovery)
> + LWLockAcquire(CommitLock, LW_EXCLUSIVE);
> + slotno = SimpleLruReadPage_ReadOnly(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
>
> /*
> * Set the main transaction id, if any.
> @@ -312,6 +331,9 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
> ClogCtl->shared->page_dirty[slotno] = true;
>
> LWLockRelease(CLogControlLock);
> +
> + if (!InRecovery)
> + LWLockRelease(CommitLock);
> }

TransactionIdSetPageStatus() calls TransactionIdSetStatusBit(), which
writes an 8 byte variable (the lsn). That's not safe.

Maybe write_ok = XLogRecPtrIsInvalid(lsn) is trying to address that? If
so, I don't see how. A page is returned with only the shared lock held
if it's in VALID state before. Even if that were changed, this'd be a
mightily subtle thing to do without a very fat comment.

> @@ -447,7 +447,8 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
> * It is unspecified whether the lock will be shared or exclusive.
> */
> int
> -SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
> +SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, bool write_ok,
> + TransactionId xid)
> {
> SlruShared shared = ctl->shared;
> int slotno;
> @@ -460,7 +461,9 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
> {
> if (shared->page_number[slotno] == pageno &&
> shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
> - shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
> + shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS &&
> + (write_ok ||
> + shared->page_status[slotno] != SLRU_PAGE_WRITE_IN_PROGRESS))
> {
> /* See comments for SlruRecentlyUsed macro */
> SlruRecentlyUsed(shared, slotno);
> @@ -472,7 +475,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
> LWLockRelease(shared->ControlLock);
> LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
>
> - return SimpleLruReadPage(ctl, pageno, true, xid);
> + return SimpleLruReadPage(ctl, pageno, write_ok, xid);
> }

This function's name would definitely need to be changed, and it'll need
to be documented when/how it's safe to use write_ok = true. Same with
slru.c's header.

A patch like this will need far more changes. Every read and write from
a page has to be guaranteed to only be done once, otherwise you can get
'effectively torn' values.

That is, you can't just do
static void
TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
...
char *byteptr;
char byteval;
...
/* note this assumes exclusive access to the clog page */
byteval = *byteptr;
byteval &= ~(((1 << CLOG_BITS_PER_XACT) - 1) << bshift);
byteval |= (status << bshift);
*byteptr = byteval;
...

the compiler is entirely free to "optimize" away the byteval variable
and do all these masking operations on memory! It can intermittenly
write temporary values to byteval, because e.g. the register pressure is
too high.

To actually rely on single-copy-atomicity you have to enforce that these
reads/writes can only happen. Leavout out some possible macro
indirection that'd have to look like
byteval = (volatile char *) byteptr;
...
*(volatile char *) byteptr = byteval;
some for TransactionIdGetStatus(), without the write side obviously.

and stuff like
if (!XLogRecPtrIsInvalid(lsn))
{
int lsnindex = GetLSNIndex(slotno, xid);

if (ClogCtl->shared->group_lsn[lsnindex] < lsn)
ClogCtl->shared->group_lsn[lsnindex] = lsn;
}
just aren't possible at all unless we drop a bunch of platforms.

In essence I think this patch allows us to roughly benchmark whether and
how benefitial fixing the contention around clog is, but it seems pretty
far from something that can actually be applied.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-08-22 14:24:52 Re: New functions
Previous Message Fabien COELHO 2015-08-22 06:39:36 Re: PATCH: numeric timestamp in log_line_prefix