Re: max_standby_delay considered harmful

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: max_standby_delay considered harmful
Date: 2010-05-10 08:01:57
Message-ID: 87mxw85gne.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, May 9, 2010 at 6:58 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> The difference is that in HS you have to wait for a moment where *no exclusive
>> lock at all* exist, possibly without contending for any of them, while on the
>> master you might not even blocked by the existence of any of those locks.
>>
>> If you have two sessions which in overlapping transactions lock different
>> tables exlusively you have no problem shutting the master down, but you will
>> never reach a point where no exclusive lock is taken on the slave.
>
> A possible solution to this in the shutdown case is to kill anyone
> waiting on a lock held by the startup process at the same time we kill
> the startup process, and to kill anyone who subsequently waits for
> such a lock as soon as they attempt to take it. I'm not sure if this
> would also make sense in the pause case.

Well, wait, I'm getting lost here. It seems to me that no query on the
slave is granted to take AEL, not matter what. The only case is a query
waiting for the replay to release its locks.

The only consequence of pause not waiting for any lock to get released
from the replay is that those backends will be, well, paused. But that
applies the same to any backend started after we pause.

Waiting for replay to release all its locks before to pause would mean
that there's a possibility that the activity on the master is such that
you never reach a pause in the WAL stream. Let's assume we want any new
code we throw in at this stage to be a magic wand making every use happy
at once.

So we'd need a pause function taking either 1 or 2 arguments, first is
to say we pause now even if we know the replay is holding some locks
that might pause the reporting queries too, the other is to wait until
the locks are not held anymore, with a timeout (default 1min?).

Ok, that's designing the API we're missing, and we should not be in the
process of doing any design at this stage. But we are.

> [good summary of current positions]
> I can't presume to extract a consensus from that; I don't think there
> is one.

All we know for sure is that Tom does not want to release as-is, and he
rightfully insists on several objectives as far as the editing is
concerned:
- no addition of code we might want to throw away later
- avoid having to deprecate released behavior, it's too hard
- minimal change set, possibly with no new features.

One more, pausing the replay is *already* in the code base, it's exactly
what happens under the hood if you favor queries rather than replay, to
the point I don't understand why the pause design needs to happen
now. We're only talking about having an *explicit* version of it.

Regards,
--
dim

I too am growing tired of insisting this much. I only continue because I
really can't get to understand why-o-why considering a new API over
existing feature is not possible at this stage. I'm hitting my head on
the wal, so to say…

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Brendan Hill 2010-05-10 08:09:06 Query execution plan from 8.3 -> 8.4
Previous Message Takahiro Itagaki 2010-05-10 06:34:19 Re: "SET search_path" clause ignored during function creation