Re: subscription worker signalling wal writer too much

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: subscription worker signalling wal writer too much
Date: 2017-06-14 23:24:27
Message-ID: CAMkU=1xycMxkVV3ccwdxSF+HgJ1d7YwHf4Y52-A+iDJ5Cmg8Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote:
> >
> > > If I publish a pgbench workload and subscribe to it, the subscription
> > > worker is signalling the wal writer thousands of times a second, once
> for
> > > every async commit. This has a noticeable performance cost.
> > >
> >
> > I've used a local variable to avoid waking up the wal writer more than
> once
> > for the same page boundary. This reduces the number of wake-ups by about
> > 7/8.
>
> Maybe I'm missing something here, but isn't that going to reduce our
> guarantees about when asynchronously committed xacts are flushed out?
> You can easily fit a number of commits into the same page... As this
> isn't specific to logical-rep, I don't think that's ok.
>

The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
this changes that. (Also, it isn't really a guarantee, the fsync can take
many seconds to complete once we do initiate it, and there is absolutely
nothing we can do about that, other than do the fsync synchronously in the
first place).

The reason for kicking the wal writer at page boundaries is so that hint
bits can get set earlier than they otherwise could. But I don't think
kicking it multiple times per page boundary can help in that effort.

>
> Have you chased down why there's that many wakeups? Normally I'd have
> expected that a number of the SetLatch() calls get consolidated
> together, but I guess walwriter is "too quick" in waking up and
> resetting the latch?
>

I'll have to dig into that some more. The 7/8 reduction I cited was just
in calls to SetLatch from that part of the code, I didn't measure whether
the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
determined that reduction, so it wasn't truly wake ups I measured. Actual
wake ups were measured only indirectly via the impact on performance. I'll
need to figure out how to instrument that without distorting the
performance too much in the process..

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-06-14 23:29:22 Re: subscription worker signalling wal writer too much
Previous Message Tatsuo Ishii 2017-06-14 23:12:39 Re: Document bug regarding read only transactions