On Mon, Dec 26, 2011 at 18:01, Alexander Björnhagen
> I suppose this conversation would lend itself better to a whiteboard
> or a maybe over a few beers instead of via e-mail ...
mmm. beer... :-)
>>>>> Well, I think this is one of those cases where you could argue either
>>>>> way. Someone caring more about high availability of the system will
>>>>> want to let the master continue and just raise an alert to the
>>>>> operators. Someone looking for an absolute guarantee of data
>>>>> replication will say otherwise.
>>>>If you don't care about the absolute guarantee of data, why not just
>>>>use async replication? It's still going to replicate the data over to
>>>>the client as quickly as it can - which in the end is the same level
>>>>of guarantee that you get with this switch set, isn't it?
>>> This setup does still guarantee that if the master fails, then you can
>>> still fail over to the standby without any possible data loss because
>>> all data is synchronously replicated.
>>Only if you didn't have a network hitch, or if your slave was down.
>>Which basically means it doesn't *guarantee* it.
> True. In my two-node system, I’m willing to take that risk when my
> only standby has failed.
> Most likely (compared to any other scenario), we can re-gain
> redundancy before another failure occurs.
> Say each one of your nodes can fail once a year. Most people have much
> better track record than with their production machines/network/etc
> but just as an example. Then on any given day there is a 0,27% chance
> that at given node will fail (1/365*100=0,27), right ?
> Then the probability of both failing on the same day is (0,27%)^2 =
> 0,000074 % or about 1 in 13500. And given that it would take only a
> few hours tops to restore redundancy, it is even less of a chance than
> that because you would not be exposed for the entire day.
That is assuming the failures are actually independent. In my
experience, they're usually not.
But that's diverging into math, which really isn't my strong side here :D
> So, to be a bit blunt about it and I hope I don’t come off a rude
> here, this dual-failure or creeping-doom type scenario on a two-node
> system is probably not relevant but more an academical question.
Given how many times I've seen it, I'm going to respectfully disagree
with that ;)
That said, I agree it's not necessarily reasonable to try to defend
against that in a two node cluster. You can always make it three-node
if you need to do that. I'm worried that the interface seems a bit
fragile and that it's hard to "be sure". Predictable interfaces are
>>> I want to replicate data with synchronous guarantee to a disaster site
>>> *when possible*. If there is any chance that commits can be
>>> replicated, then I’d like to wait for that.
>>There's always a chance, it's just about how long you're willing to wait ;)
> Yes, exactly. When I can estimate it I’m willing to wait.
>>Another thought could be to have something like a "sync_wait_timeout",
>>saying "i'm willing to wait <n> seconds for the syncrep to be caught
>>up. If nobody is cauth up within that time,then I can back down to
>>async mode/"standalone" mode". That way, data availaibility wouldn't
>>be affected by short-time network glitches.
> This was also mentioned in the previous thread I linked to,
> “replication_timeout“ :
Hmm. That link was gone from the thread when I read it - I missed it
completely. Sorry about that.
So reading that thread, it really only takes care of one of the cases
- the replication_timeout only fires if the slave "goes dead". It
could be useful if a similar timeout would apply if I did a "pg_ctl
restart" on the slave - making the master wait <n> seconds before
going into standalone mode. The way I read the proposal now, the
master would immediately go into standalone mode if the standby
actually *closes* the connection instead of timing it out?
> In a HA environment you have redundant networking and bonded
> interfaces on each node. The only “glitch” would really be if a switch
> failed over and that’s a pretty big “if” right there.
Switches fail a lot. And there are a lot more things in between that
can fail. I don't think it's such a big if - network issues are by far
the most common cases of a HA environment failing I've seen lately.
>>> If however the disaster node/site/link just plain fails and
>>> replication goes down for an *indefinite* amount of time, then I want
>>> the primary node to continue operating, raise an alert and deal with
>>> that. Rather than have the whole system grind to a halt just because a
>>> standby node failed.
>>If the standby node failed and can be determined to actually be failed
>>(by say a cluster manager), you can always have your cluster software
>>(or DBA, of course) turn it off by editing the config setting and
>>reloading. Doing it that way you can actually *verify* that the site
>>is gone for an indefinite amount of time.
> The system might as well do this for me when the standby gets
> disconnected instead of halting the master.
I guess two ways of seeing it - the flip of that coin is "the system
can already do this for you"...
>>> If we were just talking about network glitches then I would be fine
>>> with the current behavior because I do not believe they are
>>> long-lasting anyway and they are also *quantifiable* which is a huge
>>But the proposed switches doesn't actually make it possible to
>>differentiate between these "non-long-lasting" issues and long-lasting
>>ones, does it? We might want an interface that actually does...
> “replication_timeout” where the primary disconnects the WAL sender
> after a timeout together with “synchronous_standalone_master” which
> tells the primary it can continue anyway when that happens allows
> exactly that. This would then be first part towards that but I wanted
> to start out small and I personally think it is sufficient to draw the
> line at TCP disconnect of the standby.
Maybe it is. It still seems fragile to me.
>>>>But wouldn't an async standby still be a lot better than no standby at
>>> As soon as the standby comes back online, I want to wait for it to sync.
>>I guess I just find this very inconsistent. You're willing to wait,
>>but only sometimes. You're not willing to wait when it goes down, but
>>you are willing to wait when it comes back. I don't see why this
>>should be different, and I don't see how you can reliably
>>differentiate between these two.
> When the wait is quantifiable, I want to wait (like a connected
> standby that is in the process of catching up). When it is not (like
> when the remote node disappeared and the master has no way of knowing
> for how long), I do not want to wait.
> In both cases I want to send off alerts, get people involved and fix
> the problem causing this, it is not something that should happen
>>What about the slave rebooting, for example? That'll usually be pretty
>>quick too - so you'd be ok waiting for that. But your patch doesn't
>>let you wait for that - it will switch to standalone mode right away?
>>But if it takes 30 seconds to reboot, and then 30 seconds to catch up,
>>you are *not* willing to wait for the first 30 seconds, but you 'are*
>>willing fo wait for the second? Just seems strange to me, I guess...
> That’s exactly right. While the standby is booting, the master has no
> way of knowing what is going on with that standby so then I don’t want
> to wait.
> When the standby has managed to boot, connect and started to sync up
> the data that it was lagging behind, then I do want to wait because I
> know that it will not take too long before it has caught up.
Yeah, that does make sense, when you look at it like that.
In response to
pgsql-hackers by date
|Next:||From: Alvaro Herrera||Date: 2011-12-26 22:56:13|
|Previous:||From: Alexander Björnhagen||Date: 2011-12-26 17:06:27|
|Subject: Re: Standalone synchronous master|