Re: Sync Rep: First Thoughts on Code

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep: First Thoughts on Code
Date: 2008-12-23 16:24:36
Message-ID: 495110C4.4070109@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> You scare me that you see failover as sufficiently frequent that you are
> worried that being without one of the servers for an extra 60 seconds
> during a failover is a problem. And then say you're not going to add the
> feature after all. I really don't understand. If its important, add the
> feature, the whole feature that is. If not, don't.
>
> My expectation is that most failovers are serious ones, that the primary
> system is down and not coming back very fast. Your worries seem to come
> from a scenario where the primary system is still up but Postgres
> bounces/crashes, we can diagnose the cause of the crash, decide the
> crashed server is safe and then wish to recommence operations on it
> again as quickly as possible, where seconds count it doing so.
>
> Are failovers going to be common? Why?
>

Hi Simon:

I agree with most of your criticism to the "fail over only approach" -
but don't agree that fail over frequency should really impact
expectations for the failed system to return to service. I see "soft"
fails (*not* serious) to potentially be common - somewhere on the
network, something went down or some packet was lost, and the system
took a few too many seconds to respond. My expectation is that the
system can quickly detect that the node is out of service, be removed
from the pool, when the situation is resolved (often automatically
outside of my control) automatically "catch up" and be put back into the
pool. Having to run some other process such as rsync seems unreliable as
we already have a mechanism for streaming the data. All that is missing
is streaming from an earlier point in time to catch up efficiently and
reliably.

I think I'm talking more about the complete solution though which is in
line with what you are saying? :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2008-12-23 16:27:57 Re: incoherent view of serializable transactions
Previous Message Heikki Linnakangas 2008-12-23 16:23:38 Synchronous replication, network protocol