Re: Sync Rep at Oct 5

From: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sync Rep at Oct 5
Date: 2010-10-05 15:30:30
Message-ID: 4CAB4496.4080408@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10-10-05 04:32 AM, Simon Riggs wrote:
>
> This is an attempt to compile everybody's stated viewpoints and come to
> an understanding about where we are and where we want to go. The idea
> from here is that we discuss what we are trying to achieve
> (requirements) and then later come back to how (design).

Great start on summarizing the discussions. Getting a summary of the
requirements in one place will help people who haven't been diligent in
following all the sync-rep email threads stay involved.

<snip>

> == Failover Configuration Minimisation ==
>
> An important aspect of robustness is the ability to specify a
> configuration that will remain in place even though 1 or more servers
> have gone down.
>
> It is desirable to specify sync rep requirements such that we do not
> refer to individual servers, if possible. Each such rule necessarily
> requires an "else" condition, possibly multiple else conditions.
>
> It is desirable to avoid both of these
> * the need to have different configuration files on each node
> * the need to have configurations that only become active in case of
> failure. These are known to be hard to test and very likely to be
> misconfigured in the event of failover [I know a bank that was down for
> a whole week when standby server's config was wrong and had never been
> fully tested. The error was simple and obvious, but the fault showed
> itself as a sporadic error that was difficult to trace]
>

Also on the topic of failover how do we want to deal with the master
failing over. Say M->{S1,S2} and M fails and we promote S1 to M1. Can
M1->S2? What if S2 was further along in processing than S1 when M
failed? I don't think we want to take on this complexity for 9.1 but
this means that after M fails you won't have a synchronous replica until
you rebuild or somehow reset S2.

> == Sync Rep Performance ==
>
> Sync Rep is a potential performance hit, and that hit is known to
> increase as geographical distance increases.
>
> We want to be able to specify the performance of some nodes so that we
> have 4 levels of robustness:
> async - doesn't wait for sync
> recv - syncs when messages received by standby
> fsync - syncs when messages written to disk by standby
> apply - sync when messages applied to standby

Will read-only queries running on a slave hold up transactions from
being applied on that slave? I suspect that for most people running
with 'apply' they would want the answer to be 'no'. Are we going to
revisit the standby query cancellation discussion?

> == Path Minimization ==
>
> We want to be able to minimize and control the path of data transfer,
> * so that the current master doesn't have initiate transfer to all
> dependent nodes, thereby reducing overhead on master
> * so that if the path from current master to descendent is expensive we
> would minimize network costs.
>
> This requirement is commonly known as "relaying".
>
> In its most simply stated form, we want one standby to be able to get
> WAL data from another standby. e.g. M -> S -> S. Stating the problem in
> that way misses out on the actual requirement, since people would like
> the arrangement to be robust in case of failures of M or any S. If we
> specify the exact arrangement of paths then we need to respecify the
> arrangement of paths if a server goes down.

Are we going to allow these paths to be reconfigured on a live cluster?
If we have M->S1->S2 and we want to reconfigure S2 to read from M then
S2 needs to get the data that has already been committed on S1 from
somewhere (either S1 or M). This has solutions but it adds to the
complexity. Maybe not for 9.1

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-10-05 15:32:59 Re: standby registration (was: is sync rep stalled?)
Previous Message Marko Tiikkaja 2010-10-05 15:29:31 Re: top-level DML under CTEs