Re: max_standby_delay considered harmful

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: max_standby_delay considered harmful
Date: 2010-05-10 00:56:09
Message-ID: AANLkTinWKJ4tfP_IoZs1iR3Bt1vyORftczsFlhTGtbWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 9, 2010 at 6:58 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On Monday 10 May 2010 00:25:44 Florian Pflug wrote:
>> On May 9, 2010, at 22:01 , Robert Haas wrote:
>> > On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
> wrote:
>> >> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> >>> The only remaining option is to continue applying WAL until you reach
>> >>> a point where no locks are held, then pause. But from a user's POV
>> >>> that is nearly indistinguishable from simply setting
>> >>> hot_standby_conflict_winner to in the first place I think.
>> >>
>> >> Not really, the use case would be using the slave as a reporting server,
>> >> you know you have say 4 hours of reporting queries during which you will
>> >> pause the recovery. So it's ok for the pause command to take time.
>> >
>> > Seems like it could take FOREVER on a busy system.  Surely that's not
>> > OK.  The fact that Hot Standby has to take exclusive locks that can't
>> > be released until WAL replay has progressed to a certain point seems
>> > like a fairly serious wart.
>>
>> If this is a serious wart then it's not one of hot standby, but one of
>> postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as
>> opposed to UPDATE/DELETE-blocking locks) are never necessary from a
>> correctness POV, they're only there for implementation reasons.
>>
>> Getting rid of them doesn't seem completely insurmountable either - just as
>> multiple row versions remove the need to block SELECTs dues to concurrent
>> UPDATEs, multiple datafile versions could remove the need to block SELECTs
>> due to concurrent ALTERs. But people seem to live with them quite well,
>> judged from the amount of work put into getting rid of them (zero). I
>> therefore fail to see why they should pose a significant problem in HS
>> setups.
> The difference is that in HS you have to wait for a moment where *no exclusive
> lock at all* exist, possibly without contending for any of them, while on the
> master you might not even blocked by the existence of any of those locks.
>
> If you have two sessions which in overlapping transactions lock different
> tables exlusively you have no problem shutting the master down, but you will
> never reach a point where no exclusive lock is taken on the slave.

A possible solution to this in the shutdown case is to kill anyone
waiting on a lock held by the startup process at the same time we kill
the startup process, and to kill anyone who subsequently waits for
such a lock as soon as they attempt to take it. I'm not sure if this
would also make sense in the pause case.

Another possible solution would be to try to figure out if there's a
way to delay application of WAL that requires the taking of AELs to
the point where we could apply it all at once. That might not be
feasible, though, or only in some cases, and it's certainly 9.1
material (at least) in any case.

Anyway, this is all a little off-topic. We need to get back to
arguing about how best to cut the legs out from under a feature that's
been in the tree for six months but Tom didn't get around to looking
at until last week. I'll restate my position: now that I understand
what the issues are (I think), the feature as currently implemented
seems pretty wonky, but cutting it down to a boolean seems like an
exercise in excessive pessimism about our ability to predict future
development directions, as well as possibly quite inconvenient for
people attempting to use Hot Standby. Therefore I think we should
adopt Tom's original proposal (with +1 also from Stephen Frost), but
that doesn't seem likely to fly because, on the one hand, we have Tom
himself arguing (along with Bruce and possibly Heikki) that we should
whack it down all the way to a boolean; and on the other hand Simon
and Greg Smith and I think also Andres Freund and Kevin Grittner
arguing that the original feature is OK as-is.

Other people who weighed in include Stefan Kaltenbrunner (who opined
that Tom had a legitimate complaint about the current design but
didn't vote for a specific resolution), Greg Sabino Mullane (who
pointed out that SOME of the issues that Tom raised could be solved
with proper time synchronization), Josh Drake (who thought requiring
NTP to be working was a bad idea, and therefore presumably favors
changing something), Josh Berkus (who changed his vote at least once
and whose priority seems to have to do with releasing before the turn
of the century than with the actual technical option we select,
apologies if I'm misreading his emails), Greg Stark (who seems to
think that a boolean will be bad news but didn't specifically vote for
another option), Dimitri Fontaine (who wants a boolean plus
pause/resume functions, or maybe a plugin facility of some kind), Rob
Wultsch (who doesn't ever want to kill queries and therefore would be
happy with a boolean), Yeb Havinga (who never wants to stall recovery
and therefore would also be happy with a boolean), and Florian Pflug
(who points out that pause/resume is actually a nontrivial feature).
Apologies if I've left anyone out or misrepresented their position.

Overall I would say opinion is about evenly split between:

- leave it as-is
- make it a Boolean
- change it in some way but to something more expressive than a Boolean

I can't presume to extract a consensus from that; I don't think there
is one. You could say "the majority of people want to change
something" and that would be true; you could also say "the majority of
people don't want a Boolean" and that would also be true.

IF we adopt "leave it as-is", then we need to document that you will
need to both run ntp and run some sort of heartbeat process on the
master to make sure that at least a small amount of WAL keeps getting
generated; or else you'll have massive query cancellations. IF we
decide to make it a Boolean, then we need to document that you have to
choose between the possibility of recovery falling arbitrarily behind
as a result of even one query holding an exclusive lock, or
alternatively instantaneously canceling queries that conflict, however
briefly, with replay. IF we adopt Tom's original proposal, then we'll
need to document that the timeout given is per-lock-wait, and
therefore if the lock timeout is not zero and there are many lock
waits the standby may fall far behind and have difficulty catching up.
IF we decide to do something else, then I don't know.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-05-10 03:36:55 Re: 9.0b1: "ERROR: btree index keys must be ordered by attribute"
Previous Message Ian Barwick 2010-05-10 00:55:13 Re: 9.0b1: "ERROR: btree index keys must be ordered by attribute"