Re: Synchronous Standalone Master Redoux

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: sthomas(at)optionshouse(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous Standalone Master Redoux
Date: 2012-07-10 23:02:43
Message-ID: CAAZKuFZTZ004Fc=4rkAtZP94bdnw3UfriV6VEj4u5YwaV++mMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 10, 2012 at 2:42 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:>
> What you explain you want reads to me "Async replication + Archiving".

Notable caveat: one can't very easily measure or bound the amount of
transaction loss in any graceful way as-is. We only have "unlimited
lag" and "2-safe or bust".

Presumably the DRBD setup run by the original poster can do this:

* run without a partner in a degraded mode (to use common RAID terminology)

* asynchronous rebuild and catch-up of a new remote RAID partner

* switch to synchronous RAID-1, which attenuates the source of block
device changes to get 2-safe reliability (i.e. blocking on
confirmations from two block devices)

However, the tricky part is what is DRBD's heuristic when suffering
degraded but non-zero performance of the network or block device will
drop attempts to replicate to its partner. Postgres's interpretation
is "halt, because 2-safe is currently impossible." DRBD seems to be
"continue" (but hopefully record a statistic, because who knows how
often you are actually 2-safe, then).

For example, what if DRBD can only complete one page per second for
some reason? Does it it simply have the primary wait at this glacial
pace, or drop synchronous replication and go degraded? Or does it do
something more clever than just a timeout?

These may seem like theoretical concerns, but 'slow, but non-zero'
progress has been an actual thorn in my side many times.

Regardless of what DRBD does, I think the problem with the async/sync
duality as-is is there is no nice way to manage exposure to
transaction loss under various situations and requirements. I'm not
really sure what a solution might look like; I was going to do
something grotesque and conjure carefully orchestrated standby status
packets to accomplish this.

--
fdr

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2012-07-10 23:23:26 Re: Patch: add conversion from pg_wchar to multibyte
Previous Message Tom Lane 2012-07-10 22:39:07 Re: enhanced error fields