|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||Jeff Janes <jeff(dot)janes(at)gmail(dot)com>|
|Subject:||Re: subscription worker signalling wal writer too much|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
> > Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
> > this afaics would allow wal writer to go into sleep mode with half a
> > page filled, and it'd not be woken up again until the page is filled.
> > No?
> If it is taking the big sleep, then we always wake it up, since
> I didn't change that part. I only changed what we do if it not hibernating.
Right, I hadn't read enough of the context.
> It looks like only limited consolidation was happening, the number of kills
> invoked was more than half of the number of SetLatch. I think the reason
> is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately
> suspends the calling process and gives the signalled process a chance to
> run in that time slice. The Wal Writer gets woken up, sees that it only
> has a partial page to write and decides not to do anything, but resets the
> latch. It does this fast enough that the subscription worker hasn't
> migrated CPUs yet.
I think part of the problem here is that latches signal the owning
process even if the owning process isn't actually sleeping - that's
obviously quite pointless. In cases where walwriter is busy, that
actually causes a lot of superflous interrupted syscalls, because it
actually never ends up waiting on the latch. There's also a lot of
superflous context switches. I think we can avoid doing so quite
easily, as e.g. with the attached patch. Could you check how much that
helps your benchmark?
> The first change made the WALWriter wake up and do a write and flush
> whenever an async commit occurred and there was a completed WAL page to be
> written. This way the hint bits could be set while the heap page was still
> hot, because they couldn't get set until the WAL covering the hinted-at
> transaction commit is flushed.
> The second change said we can set hint bits before the WAL covering the
> hinted-at transaction is flushed, as long the page LSN is newer than that
> (so the wal covering the hinted-at transaction commit must be flushed
> before the page containing the hint bit can be written).
> Then the third change makes the wal writer only write the full pages most
> of the times it is woken up, not flush them. It only flushes them once
> every wal_writer_delay or wal_writer_flush_after, regardless of how often
> it is woken up.
> It seems like the third change rendered the first one useless.
I don't think so. Isn't the walwriter writing out the contents of the
WAL is quite important because it makes room in wal_buffers for new WAL?
I suspect we actually should wake up wal-writer *more* aggressively, to
offload wal fsyncs from individual backends, even when they're not
committing. Right now we fsync whenever a segment is finished - we
really don't want to do that in backends that could do other useful
work. I suspect it'd be a good idea to remove that logic from
XLogWrite() and replace it with
> Wouldn't it
> better, and much simpler, just to have reverted the first change once the
> second one was done?
I think both can actually happen independently, no? It's pretty common
for the page lsn to be *older* than the corresponding commit. In fact
you'll always hit that case unless there's also concurrent writes also
touching said page.
> If that were done, would the third change still be
> needed? (It did seem to add some other features as well, so I'm going to
> assume we still want those...).
It's still useful, yes. It avoids flushing the WAL too often
(page-by-page when there's a lot of writes), which can eat up a lot of
IOPS on fast storage.
> But now the first change is even worse than useless, it is positively
> harmful. The only thing to stop it from waking the WALWriter for every
> async commit is this line:
> /* if we have already flushed that far, we're done */
> if (WriteRqstPtr <= LogwrtResult.Flush)
> Since LogwrtResult.Flush doesn't advance anymore, this condition doesn't
> becomes false due to us waking walwriter, it only becomes false once the
> timeout expires (which it would have done anyway with no help from us), or
> once wal_writer_flush_after is exceeded. So now every async commit is
> waking the walwriter. Most of those wake up do nothing (there is not a
> completely new patch to write), some write one completed page but don't
> flush anything, and very few do a flush, and that one would have been done
s/completely new patch/completely new page/?
In my opinion we actually *do* want to write (but not flush!) out such
pages, so I'm not sure I agree with that logic. Have to think about
this some more...
|Next Message||Tatsuo Ishii||2017-06-25 00:20:10||Re: shift_sjis_2004 related autority files are remaining|
|Previous Message||Curtis Ruck||2017-06-24 22:29:28||Re: FIPS mode?|