Re: WALWriteLock contention

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WALWriteLock contention
Date: 2015-05-16 01:15:27
Message-ID: CAMkU=1xCRefsBMvWYWVJwxJnrxWSggb0sQjDffmOsM8U-vBSmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 15, 2015 at 9:06 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> WALWriteLock contention is measurable on some workloads. In studying
> the problem briefly, a couple of questions emerged:
>
> ...

>
> 2. I don't really understand why WALWriteLock is set up to prohibit
> two backends from flushing WAL at the same time. That seems
> unnecessary. Suppose we've got two backends that flush WAL one after
> the other. Assume (as is not unlikely) that the second one's flush
> position is ahead of the first one's flush position. So the first one
> grabs WALWriteLock and does the flush, and then the second one grabs
> WALWriteLock for its turn to flush and has to wait for an entire spin
> of the platter to complete before its fsync() can be satisfied. If
> we'd just let the second guy issue his fsync() right away, odds are
> good that the disk would have satisfied both in a single rotation.
> Now it's possible that the second request would've arrived too late
> for that to work out, but AFAICS in that case we're no worse off than
> we are now. And if it does work out we're better off. The only
> reasons I can see why we might NOT want to do this are (1) if we're
> trying to compensate for some OS-level bugginess, which is a
> horrifying thought, or (2) if we think the extra system calls will
> cost more than we save by piggybacking the flushes more efficiently.
>

I implemented this 2-3 years ago, just dropping the WALWriteLock
immediately before the fsync and then picking it up again immediately
after, and was surprised that I saw absolutely no improvement. Of course
it surely depends on the IO stack, but from what I saw it seemed that once
a fsync landed in the kernel, any future ones on that file were blocked
rather than consolidated. Alas I can't find the patch anymore, I can make
more of an effort to dig it up if anyone cares. Although it would probably
be easier to reimplement it than it would be to find it and rebase it.

I vaguely recall thinking that the post-fsync bookkeeping could be moved to
a spin lock, with a fair bit of work, so that the WALWriteLock would not
need to be picked up again, but the whole avenue didn't seem promising
enough for me to worry about that part in detail.

My goal there was to further improve group commit. When running pgbench
-j10 -c10, it was common to see fsyncs that alternated between flushing 1
transaction, and 9 transactions. Because the first one to the gate would go
through it and slam it on all the others, and it would take one fsync cycle
for it reopen.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-05-16 01:53:39 Re: Final Patch for GROUPING SETS
Previous Message Jim Nasby 2015-05-16 00:34:15 Re: Triaging the remaining open commitfest items