Re: subscription worker signalling wal writer too much

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: subscription worker signalling wal writer too much
Date: 2017-06-25 00:09:52
Message-ID: 20170625000952.552krzrphu74mzzf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
> > Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
> > this afaics would allow wal writer to go into sleep mode with half a
> > page filled, and it'd not be woken up again until the page is filled.
> > No?
> >
>
> If it is taking the big sleep, then we always wake it up, since
> acd4c7d58baf09f.
>
> I didn't change that part. I only changed what we do if it not hibernating.

Right, I hadn't read enough of the context.

> It looks like only limited consolidation was happening, the number of kills
> invoked was more than half of the number of SetLatch. I think the reason
> is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately
> suspends the calling process and gives the signalled process a chance to
> run in that time slice. The Wal Writer gets woken up, sees that it only
> has a partial page to write and decides not to do anything, but resets the
> latch. It does this fast enough that the subscription worker hasn't
> migrated CPUs yet.

I think part of the problem here is that latches signal the owning
process even if the owning process isn't actually sleeping - that's
obviously quite pointless. In cases where walwriter is busy, that
actually causes a lot of superflous interrupted syscalls, because it
actually never ends up waiting on the latch. There's also a lot of
superflous context switches. I think we can avoid doing so quite
easily, as e.g. with the attached patch. Could you check how much that
helps your benchmark?

> The first change made the WALWriter wake up and do a write and flush
> whenever an async commit occurred and there was a completed WAL page to be
> written. This way the hint bits could be set while the heap page was still
> hot, because they couldn't get set until the WAL covering the hinted-at
> transaction commit is flushed.
>
> The second change said we can set hint bits before the WAL covering the
> hinted-at transaction is flushed, as long the page LSN is newer than that
> (so the wal covering the hinted-at transaction commit must be flushed
> before the page containing the hint bit can be written).
>
> Then the third change makes the wal writer only write the full pages most
> of the times it is woken up, not flush them. It only flushes them once
> every wal_writer_delay or wal_writer_flush_after, regardless of how often
> it is woken up.
>
> It seems like the third change rendered the first one useless.

I don't think so. Isn't the walwriter writing out the contents of the
WAL is quite important because it makes room in wal_buffers for new WAL?

I suspect we actually should wake up wal-writer *more* aggressively, to
offload wal fsyncs from individual backends, even when they're not
committing. Right now we fsync whenever a segment is finished - we
really don't want to do that in backends that could do other useful
work. I suspect it'd be a good idea to remove that logic from
XLogWrite() and replace it with
if (ProcGlobal->walwriterLatch)
SetLatch(ProcGlobal->walwriterLatch);

> Wouldn't it
> better, and much simpler, just to have reverted the first change once the
> second one was done?

I think both can actually happen independently, no? It's pretty common
for the page lsn to be *older* than the corresponding commit. In fact
you'll always hit that case unless there's also concurrent writes also
touching said page.

> If that were done, would the third change still be
> needed? (It did seem to add some other features as well, so I'm going to
> assume we still want those...).

It's still useful, yes. It avoids flushing the WAL too often
(page-by-page when there's a lot of writes), which can eat up a lot of
IOPS on fast storage.

> But now the first change is even worse than useless, it is positively
> harmful. The only thing to stop it from waking the WALWriter for every
> async commit is this line:
>
> /* if we have already flushed that far, we're done */
> if (WriteRqstPtr <= LogwrtResult.Flush)
> return;
>
> Since LogwrtResult.Flush doesn't advance anymore, this condition doesn't
> becomes false due to us waking walwriter, it only becomes false once the
> timeout expires (which it would have done anyway with no help from us), or
> once wal_writer_flush_after is exceeded. So now every async commit is
> waking the walwriter. Most of those wake up do nothing (there is not a
> completely new patch to write), some write one completed page but don't
> flush anything, and very few do a flush, and that one would have been done
> anyway.

s/completely new patch/completely new page/?

In my opinion we actually *do* want to write (but not flush!) out such
pages, so I'm not sure I agree with that logic. Have to think about
this some more...

Greetings,

Andres Freund

Attachment Content-Type Size
0001-Don-t-signal-latch-owners-if-owner-not-waiting-on-la.patch text/x-patch 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2017-06-25 00:20:10 Re: shift_sjis_2004 related autority files are remaining
Previous Message Curtis Ruck 2017-06-24 22:29:28 Re: FIPS mode?