Quick Links

Re: subscription worker signalling wal writer too much

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: subscription worker signalling wal writer too much
Date:	2017-06-14 23:29:22
Message-ID:	20170614232922.igl2qhdeqdp77niq@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2017-06-14 16:24:27 -0700, Jeff Janes wrote:
> On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> > wrote:
> > >
> > > > If I publish a pgbench workload and subscribe to it, the subscription
> > > > worker is signalling the wal writer thousands of times a second, once
> > for
> > > > every async commit. This has a noticeable performance cost.
> > > >
> > >
> > > I've used a local variable to avoid waking up the wal writer more than
> > once
> > > for the same page boundary. This reduces the number of wake-ups by about
> > > 7/8.
> >
> > Maybe I'm missing something here, but isn't that going to reduce our
> > guarantees about when asynchronously committed xacts are flushed out?
> > You can easily fit a number of commits into the same page... As this
> > isn't specific to logical-rep, I don't think that's ok.
> >
>
> The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
> this changes that. (Also, it isn't really a guarantee, the fsync can take
> many seconds to complete once we do initiate it, and there is absolutely
> nothing we can do about that, other than do the fsync synchronously in the
> first place).

Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
this afaics would allow wal writer to go into sleep mode with half a
page filled, and it'd not be woken up again until the page is filled.
No?

> > Have you chased down why there's that many wakeups? Normally I'd have
> > expected that a number of the SetLatch() calls get consolidated
> > together, but I guess walwriter is "too quick" in waking up and
> > resetting the latch?

> I'll have to dig into that some more. The 7/8 reduction I cited was just
> in calls to SetLatch from that part of the code, I didn't measure whether
> the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
> determined that reduction, so it wasn't truly wake ups I measured. Actual
> wake ups were measured only indirectly via the impact on performance. I'll
> need to figure out how to instrument that without distorting the
> performance too much in the process..

I'd suspect that just measuring the number of kill() calls should be
doable, if measured via perf or something like hta.t

Greetings,

Andres Freund

In response to

Re: subscription worker signalling wal writer too much at 2017-06-14 23:24:27 from Jeff Janes

Responses

Re: subscription worker signalling wal writer too much at 2017-06-15 22:06:43 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2017-06-14 23:47:15	Re: logical replication: \dRp+ and "for all tables"
Previous Message	Jeff Janes	2017-06-14 23:24:27	Re: subscription worker signalling wal writer too much