Re: [PATCH] Refactoring of LWLock tranches

From: "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: [PATCH] Refactoring of LWLock tranches
Date: 2015-11-16 00:20:28
Message-ID: 20151116002028.GA24814@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-11-15 16:24:24 -0500, Robert Haas wrote:
> I think it needs to be adapted to use predefined constants for the
> tranche IDs instead of hardcoded values, maybe based on the attached
> tranche-constants.patch.

Yea, that part is clearly not ok. Let me look at the patch.

> Also, I think we should rip all the volatile qualifiers out of
> bufmgr.c, using the second attached patch, devolatileize-bufmgr.patch.
> The comment claiming that we need this because spinlocks aren't
> compiler barriers is obsolete.

Same here.

> @@ -457,7 +457,7 @@ CreateLWLocks(void)
> LWLockCounter = (int *) ((char *) MainLWLockArray - 3 * sizeof(int));
> LWLockCounter[0] = NUM_FIXED_LWLOCKS;
> LWLockCounter[1] = numLocks;
> - LWLockCounter[2] = 1; /* 0 is the main array */
> + LWLockCounter[2] = LWTRANCHE_LAST_BUILTIN_ID + 1;
> }

Man this LWLockCounter[0] stuff is just awful. Absolutely nothing that
needs to be fixed here, but here it really should be fixed someday.

> /*
> + * We reserve a few predefined tranche IDs. These values will never be
> + * returned by LWLockNewTrancheId.
> + */
> +#define LWTRANCHE_MAIN 0
> +#define LWTRANCHE_BUFFER_CONTENT 1
> +#define LWTRANCHE_BUFFER_IO_IN_PROGRESS 2
> +#define LWTRANCHE_LAST_BUILTIN_ID LWTRANCHE_BUFFER_IO_IN_PROGRESS

Nitpick: I'm inclined to use an enum to avoid having to adjust the last
builtin id when adding a new builtin tranche.

Besides that minor thing I think this works for me. We might
independently want something making adding non-builtin tranches easier,
but that's really just independent.

> From 9e7f9219b5e752da46be0e26a0be074191ae8f62 Mon Sep 17 00:00:00 2001
> From: Robert Haas <rhaas(at)postgresql(dot)org>
> Date: Sun, 15 Nov 2015 10:24:03 -0500
> Subject: [PATCH 1/3] De-volatile-ize.

I very strongly think this should be done. It's painful having to
cast-away volatile, and it de-optimizes a fair bit of performance
relevant code.

> /* local state for StartBufferIO and related functions */
> /* local state for StartBufferIO and related functions */
> -static volatile BufferDesc *InProgressBuf = NULL;
> +static BufferDesc *InProgressBuf = NULL;
> static bool IsForInput;
>
> /* local state for LockBufferForCleanup */
> -static volatile BufferDesc *PinCountWaitBuf = NULL;
> +static BufferDesc *PinCountWaitBuf = NULL;

Hm. These could also be relevant for sigsetjmp et al. Haven't found
relevant code tho.

> - volatile BufferDesc *bufHdr;
> + BufferDesc *bufHdr;
> Block bufBlock;

Looks mis-indented now, similarly in a bunch of other places. Maybe
pg-indent afterwards?

> /*
> * Header spinlock is enough to examine BM_DIRTY, see comment in
> @@ -1707,7 +1707,7 @@ BufferSync(int flags)
> num_written = 0;
> while (num_to_scan-- > 0)
> {
> - volatile BufferDesc *bufHdr = GetBufferDescriptor(buf_id);
> + BufferDesc *bufHdr = GetBufferDescriptor(buf_id);
>

This case has some theoretical behavioural implications: As
bufHdr->flags is accessed without a spinlock the compiler might re-use
an older value. But that's ok: a) there's no old value it really could
use b) the whole race-condition exists anyway, and the comment in the
body explains why that's ok.

> BlockNumber
> BufferGetBlockNumber(Buffer buffer)
> {
> - volatile BufferDesc *bufHdr;
> + BufferDesc *bufHdr;
>
> Assert(BufferIsPinned(buffer));
>
> @@ -2346,7 +2346,7 @@ void
> BufferGetTag(Buffer buffer, RelFileNode *rnode, ForkNumber *forknum,
> BlockNumber *blknum)
> {
> - volatile BufferDesc *bufHdr;
> + BufferDesc *bufHdr;

> /* Do the same checks as BufferGetBlockNumber. */
> Assert(BufferIsPinned(buffer));
> @@ -2382,7 +2382,7 @@ BufferGetTag(Buffer buffer, RelFileNode *rnode, ForkNumber *forknum,
> * as the second parameter. If not, pass NULL.
> */
> static void
> -FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln)
> +FlushBuffer(BufferDesc *buf, SMgrRelation reln)
> {

> XLogRecPtr recptr;
> ErrorContextCallback errcallback;
> @@ -2520,7 +2520,7 @@ RelationGetNumberOfBlocksInFork(Relation relation, ForkNumber forkNum)
> bool
> BufferIsPermanent(Buffer buffer)
> {
> - volatile BufferDesc *bufHdr;
> + BufferDesc *bufHdr;

These require that the buffer is pinned (a barrier implying
operation). The pin prevents the contents from changing in a relevant
manner, the barrier implied in the pin forces the core's view to be
non-stale.

> @@ -2613,7 +2613,7 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
>
> for (i = 0; i < NBuffers; i++)
> {
> - volatile BufferDesc *bufHdr = GetBufferDescriptor(i);
> + BufferDesc *bufHdr = GetBufferDescriptor(i);

> /*
> * We can make this a tad faster by prechecking the buffer tag before
> @@ -2703,7 +2703,7 @@ DropRelFileNodesAllBuffers(RelFileNodeBackend *rnodes, int nnodes)
> for (i = 0; i < NBuffers; i++)
> {
> RelFileNode *rnode = NULL;
> - volatile BufferDesc *bufHdr = GetBufferDescriptor(i);
> + BufferDesc *bufHdr = GetBufferDescriptor(i);
>
> /*
> * As in DropRelFileNodeBuffers, an unlocked precheck should be safe
> @@ -2767,7 +2767,7 @@ DropDatabaseBuffers(Oid dbid)
>
> for (i = 0; i < NBuffers; i++)
> {
> - volatile BufferDesc *bufHdr = GetBufferDescriptor(i);
> + BufferDesc *bufHdr = GetBufferDescriptor(i);

> @@ -2863,7 +2863,7 @@ void
> FlushRelationBuffers(Relation rel)
> {
> int i;
> - volatile BufferDesc *bufHdr;
> + BufferDesc *bufHdr;
>
> /* Open rel at the smgr level if not already done */
> RelationOpenSmgr(rel);
> @@ -2955,7 +2955,7 @@ void
> FlushDatabaseBuffers(Oid dbid)
> {
> int i;
> - volatile BufferDesc *bufHdr;
> + BufferDesc *bufHdr;

These all are correct based on the premise that some heavy lock -
implying a barrier - prevents new blocks with relevant tags from being
loaded into s_b. And it's fine if some other backend concurrently writes
out the buffer before we do - we'll notice that in the re-check after

So, looks good to me.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-11-16 01:24:36 Re: proposal: multiple psql option -c
Previous Message Tom Lane 2015-11-16 00:14:12 Re: check for interrupts in set_rtable_names