Re: max_standby_delay considered harmful

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Florian Pflug <fgp(at)phlo(dot)org>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: max_standby_delay considered harmful
Date: 2010-05-12 14:40:19
Message-ID: 1273675219.308.737.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2010-05-12 at 16:03 +0200, Stefan Kaltenbrunner wrote:
> Simon Riggs wrote:
> > On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:
> >> On Wed, May 12, 2010 at 7:26 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >>> On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:
> >>>
> >>>> I'm not sure what to make of this. Sometimes not shutting down
> >>>> doesn't sound like a feature to me.
> >>> It acts exactly the same in recovery as in normal running. It is not a
> >>> special feature of recovery at all, bug or otherwise.
> >> Simon, that doesn't make any sense. We are talking about a backend
> >> getting stuck forever on an exclusive lock that is held by the startup
> >> process and which will never be released (for example, because the
> >> master has shut down and no more WAL can be obtained for replay). The
> >> startup process does not hold locks in normal operation.
> >
> > When I test it, startup process holding a lock does not prevent shutdown
> > of a standby.
> >
> > I'd be happy to see your test case showing a bug exists and that the
> > behaviour differs from normal running.
>
> In my testing the postmaster simply does not shut down even with no
> clients connected any more once in a while - most of the time it works
> just fine but in like 1 out of 10 cases it get's stuck - my testcase (as
> detailed in the related thread) is simply doing an interval load on the
> master (pgbench -T 120 && sleep 30 && pgbench -T 120 - rinse and repeat
> as needed) and pgbench -S && pg_ctl restart && pgbench -S in a lop on
> the standby. once in a while the standby will simply not shut down
> (forever - not only by eceeding the default timeout of pgctl which seems
> to get triggered much more often on the standby than on the master -
> have not looked into that yet in detail)

If you could recreate that on a server in debug mode we can see what's
happening. If you can attach to the server and get a back trace that
would help. I've not seen that behaviour at all during testing and if
the issue is sporadic its not likely to help much trying to recreate
myself.

This could be an issue with SR, or an issue with the shutdown code
itself.

--
Simon Riggs www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-05-12 15:17:24 hot update doesn't work?
Previous Message Kevin Grittner 2010-05-12 14:29:13 Re: Query execution plan from 8.3 -> 8.4