Re: streaming replication master can fail to shut down

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nick Cleaton <nick(at)cleaton(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: streaming replication master can fail to shut down
Date: 2016-04-29 18:33:32
Message-ID: 20160429183332.5tiaz2ccu36uqjee@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I pushed a fix for this to 9.4,9.5 and master yesterday. I'm not
convinced it's all that needs to be fixed, particularly for Magnus'
report.

On 2016-04-29 08:05:51 +0100, Nick Cleaton wrote:
> On 29 April 2016 at 04:38, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> >> > I guess you have a fair amount of WAL traffic, and the receiver was
> >> > behind a good bit?
> >>
> >> No, IIRC this was on the test cluster that I installed for the purpose
> >> of replicating the problem under 9.5; it was essentially idle.
> >
> > The reason I'm asking is that I so far can't really replicate the issue
> > so far. It's pretty clear that waiting_for_ping_response = true; is
> > needed, but I'm suspicious that that's not all.
> >
> > Was your standby on a separate machine?
>
> Yes, I've only seen it happen when the standby was on a machine with
> slower CPU cores than the primary. All my attempts to replicate it on
> a single machine by trying to slow down the wal receiver have failed.
> I'm fairly convinced it's some sort of race that depends on wal sender
> + network being faster than wal receiver.

Yes, that's kind of what I'm expecting. You'll only hit that branch if
there's outstanding data to be replicated, but the message has been
handed to the os (!pq_is_send_pending()). Locally that's just a small
data volume, but over actual network on a longer lived connection that
can be a lot more.

Andres

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2016-04-29 23:58:37 Re: [BUGS] Breakage with VACUUM ANALYSE + partitions
Previous Message David G. Johnston 2016-04-29 15:59:26 Re: BUG #14121: Constraint UNIQUE