Re: subscription worker signalling wal writer too much

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: jeff(dot)janes(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, pgsql-hackers(at)postgresql(dot)org
Subject: Re: subscription worker signalling wal writer too much
Date: 2017-09-19 12:48:16
Message-ID: 20170919.214816.95787528.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Sat, 26 Aug 2017 14:45:20 -0700, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote in <CAMkU=1zP931=v-ypFZMDxHAGGQwuVNUPONZE5_ggV+onkXEdvQ(at)mail(dot)gmail(dot)com>
> On Mon, Jul 3, 2017 at 8:26 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> > On Sat, Jun 24, 2017 at 5:09 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> >> On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
> >>
> >>
> >>
> >
> >> > Wouldn't it
> >> > better, and much simpler, just to have reverted the first change once
> >> the
> >> > second one was done?
> >>
> >> I think both can actually happen independently, no? It's pretty common
> >> for the page lsn to be *older* than the corresponding commit. In fact
> >> you'll always hit that case unless there's also concurrent writes also
> >> touching said page.
> >>
> >
> > That is true, but I don't see any residual regression when going from
> > pre-db76b1efbbab2441428 to 7975c5e0a992ae9a4-with-my-patch applied. So I
> > think the hint bit issue only shows up when hitting the same page
> > aggressively, in which case the LSN does get advanced quickly enough that
> > db76b1efbba solves it. Perhaps a different test case could show that
> > issue. I also didn't see any improvement from the original 4de82f7d7c50a81e,
> > so maybe 8 CPUs just isn't enough to detect the problem with setting
> > hint-buts with permanent tables with synchronous_commit=false.
> >
>
> I've refined my benchmarking, using a smaller scaling factor (8, same as my
> CPU count) and running the tests for longer. The smaller size means there
> are more intensive updates on individual pgbench_accounts pages, and the
> longer run times allows be unset hint bits to build up for longer (usually
> I like short test runs with a large number of randomized repetitions, but
> that doesn't work well here). Since the scale factor is the same as the
> number of CPUs, I've bumped the thread count so that it is more likely
> someone will choose a non-contended value of pgbench_branches.bid to update
> at any given moment.
>
> pgbench -c16 -j16 -T1200 -M prepared -b tpcb-func --startup='set
> synchronous_commit=false'
>
> This testing did show the importance of waking up the wal writer so that
> hint bits can be set:
>
> commit TPS
> cfafd8beadcd6f 22200.48
> cfafd8beadcd6f_nowake 30766.83
> median of 14 runs reported, p-val on difference of means is 2e-22.
>
> Where nowake is a hackish disabling of the wake up introduced in
> 4de82f7d7c50a8,
> forward ported to 9.6 branch. (I still wake it if is is doing the big
> sleep, I just took out the part that wake it up even from small sleeps)
>
> So my revised benchmark is capable of detecting this effect, even with only
> 8 CPUs.
>
> When I move to next commit where we set hint bits as long as the page was
> re-dirtied after, get these numbers:
>
> db76b1efbbab24 30742.69
> db76b1efbbab24_nowake 31072.97
>
> The difference between these is not statistically different, nor is the
> difference between them and cfafd8beadcd6f.
>
> So I think the logic you introduced in db76b1efbbab24 captures all the gain
> there is to be captured, and 4de82f7d7c50a8 can be reverted. The only
> reason to wake up the wal writer is if it is taking the big sleep.
>
> Of course, I can't prove a negative. There could always be some test case
> which demonstrates that 4de82f7d7c50a8 still matters in spite of
> db76b1efbbab24.
> So to be ultra-conservative, attached is a patch which should preserve all
> wake-ups other than the ones known to be useless.
>
> 7975c5e0a 27810.66
> 7975c5e0a_patched 30821.41
>
> So 7975c5 causes a significant regression (10% regression, p-val 7e-16 on
> difference of the means). It repeatedly wakes the wal-writer up to flush a
> page which the wal-writer simply refuses to flush. This can't serve any
> purpose. My patch still wakes it up to write the page if it has not been
> written yet, and also still wakes it to flush a range of pages of
> WalWriterFlushAfter is met. But it just doesn't wake it up to flush a page
> which has already been written and will not get flushed.
>
> I'd prefer the code-simplifying change of just not waking it up anymore
> except for when it is doing the big sleep, but I can't reliably detect a
> performance difference between that and the more conservative change posted
> here.

FWIW I tried this patch with taking the following numbers using
pgbench --unlogged-tables/ -T 60 -j 8 -c 8. All numbers are of
the 2nd of two successive runs.

TA: # that XLogSetAsyncXactLSN called while wal writer is awake.
TB: # that XLogSetAsyncXactLSN skipped SetLatch when the TA case.
WA: # that WalWriterMain called XLogBackgroundFlush().
WB: # that XLogBackgroundFlush() skipped flushing in the WA case.

A: without this patch : 1745tps(wal_writer_flush_after = 1MB (default))

WB/WA = 9/141100 : 0.9% of all XLogBackgroundFlush() did nothing.
(woke by SetLatch 141091 times)
TB/TA =27300/103000 : 26% of SetLatch is avoided. (12899 times called)

B: with this patch : 2041 tps (wal_writer_flush_after = 1MB (default))

WB/WA = 46/ 1600 : 2.8% of all XLogBackgroundFlush() did nothing.
(woke by SetLatch 1553 times)
TB/TA =115822/118000: 97% of SetLatch is avoided. (4178 times called)

C: with this patch : 2033 tps (But wal_writer_flush_after = 0)

WB/WA = 117/1700 : 6.9% of all XLogBackgroundFlush() did nothing.
(woke by SetLatch 1582 times)
TB/TA =110615/118000: 93% of SetLatch is avoided. (7785 times called)

The condition for skipping SetLatch of the latest patch (v2)
looks reasonable. And the effect is obvious at least in the
test. The "needless" SetLatch seems very effectively reduced and
actually improving performance. Increased size of bytes flushed
at once seems the main cause of the performance improvement.

Though a bit uneasy to have similar code on both side
(XLogBackgroundFlush and XLogSetAsyncXactLSN) but +1 to this from
me.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-09-19 12:59:34 user-defined numeric data types triggering ERROR: unsupported type
Previous Message Andrew Dunstan 2017-09-19 12:35:03 pgsql: Add citext_pattern_ops for citext contrib module