| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: subscription worker signalling wal writer too much |
| Date: | 2017-06-14 23:29:22 |
| Message-ID: | 20170614232922.igl2qhdeqdp77niq@alap3.anarazel.de |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2017-06-14 16:24:27 -0700, Jeff Janes wrote:
> On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> > wrote:
> > >
> > > > If I publish a pgbench workload and subscribe to it, the subscription
> > > > worker is signalling the wal writer thousands of times a second, once
> > for
> > > > every async commit. This has a noticeable performance cost.
> > > >
> > >
> > > I've used a local variable to avoid waking up the wal writer more than
> > once
> > > for the same page boundary. This reduces the number of wake-ups by about
> > > 7/8.
> >
> > Maybe I'm missing something here, but isn't that going to reduce our
> > guarantees about when asynchronously committed xacts are flushed out?
> > You can easily fit a number of commits into the same page... As this
> > isn't specific to logical-rep, I don't think that's ok.
> >
>
> The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
> this changes that. (Also, it isn't really a guarantee, the fsync can take
> many seconds to complete once we do initiate it, and there is absolutely
> nothing we can do about that, other than do the fsync synchronously in the
> first place).
Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
this afaics would allow wal writer to go into sleep mode with half a
page filled, and it'd not be woken up again until the page is filled.
No?
> > Have you chased down why there's that many wakeups? Normally I'd have
> > expected that a number of the SetLatch() calls get consolidated
> > together, but I guess walwriter is "too quick" in waking up and
> > resetting the latch?
> I'll have to dig into that some more. The 7/8 reduction I cited was just
> in calls to SetLatch from that part of the code, I didn't measure whether
> the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
> determined that reduction, so it wasn't truly wake ups I measured. Actual
> wake ups were measured only indirectly via the impact on performance. I'll
> need to figure out how to instrument that without distorting the
> performance too much in the process..
I'd suspect that just measuring the number of kill() calls should be
doable, if measured via perf or something like hta.t
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David G. Johnston | 2017-06-14 23:47:15 | Re: logical replication: \dRp+ and "for all tables" |
| Previous Message | Jeff Janes | 2017-06-14 23:24:27 | Re: subscription worker signalling wal writer too much |