Re: WALInsertLock contention

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WALInsertLock contention
Date: 2011-06-08 12:44:53
Message-ID: BANLkTi=2WJMLrOn_sQsG4KcJ8WDytoB1UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 8, 2011 at 1:59 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> There's probably an obvious explanation that I'm not seeing, ...

Yep. :-)

> but if
> you're not delegating the work of writing the buffers out to someone
> else, why do you need to lock the per backend buffer at all?  That is,
> why does it have to be in shared memory? Suppose that if the
> following are true:
> *) Writing qualifying data (non commit, non switch)
> *) There is room left in whatever you are copying to
> you could trylock WalInsertLock, and if failing to get it, just copy
> qualifying data into a private buffer and punt if the following are
> true...otherwise just do the current behavior.

And here it is: Writing a buffer requires a write & fsync of WAL
through the buffer LSN. If the WAL for the buffers were completely
inaccessible to other backends, then those buffers would be pinned in
shared memory. Which would make things very difficult at buffer
eviction time, or for checkpoints.

At any rate, even if it were possible to make it work, it'd be a
misplaced optimization. It isn't touching shared memory - or even
touching the LWLock - that's expensive; it's the LWLock contention
that kills you, either because stuff blocks, or just because the CPUs
burn a lot of cycles fighting over cache lines. An LWLock that is
typically taken by only one backend at a time is pretty cheap. I
suppose I couldn't afford to be so blasé if we were trying to scale to
2048-core systems where even inserting a memory barrier is expensive
enough to worry about, but we've got a ways to go before we need to
start worrying about that.

[...snip...]
>> A further refinement would be to try to jigger things so that as a
>> backend fills up per-backend WAL buffers, it somehow throws them over
>> the fence to one of the background processes to write out.  For
>> short-running transactions, that won't really make any difference,
>> since the commit will force the per-backend buffers out to the main
>> buffers anyway.  But for long-running transactions it seems like it
>> could be quite useful; in essence, the task of assembling the final
>> WAL stream from the WAL output of individual backends becomes a
>> background activity, and ideally the background process doing the work
>> is the only one touching the cache lines being shuffled around.  Of
>> course, to make this work, backends would need a steady supply of
>> available per-backend WAL buffers.  Maybe shared buffers could be used
>> for this purpose, with the buffer header being marked in some special
>> way to indicate that this is what the buffer's being used for.
>
> That seems complicated -- plus I think the key is to distribute as
> much of the work as possible. Why would the forward lateral to the
> background processor not require a similar lock to WalInsertLock?

Well, that's the problem. It would. Now, in an ideal world, you
might still hope to get some benefit: only the background writer would
typically be writing to the real WAL stream, so that's not contended.
And the contention between the background writer and the individual
backends is only two-way. There's no single point where you have
every process on the system piling on to a single lock.

But I'm not sure we can really make it work well enough to do more
than nibble around at the edges of the problem. Consider:

INSERT INTO foo VALUES (1,2,3);

This is going to generate XLOG_HEAP_INSERT followed by
XLOG_XACT_COMMIT. And now it wants to flush WAL. So now you're
pretty much forced to have it go perform the serialization operation
itself, and you're right back in contention soup. Batching two
records together and inserting them in one operation is presumably
going to be more efficient than inserting them one at a time, but not
all that much more efficient; and there are bookkeeping and memory
bandwidth costs to get there. If we are dealing with long-running
transactions, or asynchronous commit, then this approach might have
legs -- but I suspect that in real life most transactions are small,
and the default configuration is synchronous_commit=on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2011-06-08 12:48:35 Autoanalyze and OldestXmin
Previous Message Heikki Linnakangas 2011-06-08 11:30:20 Re: SIREAD lock versus ACCESS EXCLUSIVE lock