Re: Quorum commit for multiple synchronous replication.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(dot)paquier(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, noah(at)leadboat(dot)com, amit(dot)kapila16(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, petr(at)2ndquadrant(dot)com, vik(at)2ndquadrant(dot)fr, simon(at)2ndquadrant(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Quorum commit for multiple synchronous replication.
Date: 2017-04-14 07:32:45
Message-ID: 20170414.163245.189591075.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 14 Apr 2017 10:47:46 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD7Scnjrn5m+_eaDEsZnyXpbwGYw7x1sXeipAK=iqBKUQ(at)mail(dot)gmail(dot)com>
> On Fri, Apr 14, 2017 at 9:38 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Fri, Apr 14, 2017 at 2:47 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >> I'm thinking that it's less confusing to report always 0 as the priority of
> >> async standby whatever the setting of synchronous_standby_names is.
> >> Thought?
> >
> > Or we could have priority being reported to NULL for async standbys as
> > well, the priority number has no meaning for them anyway...
>
> I agree to set the same thing (priority or NULL) to all sync standby
> in a quorum set. As Fujii-san mentioned, I also think that it means
> all standbys in a quorum set can be chosen equally. But to less
> confusion for current user I'd not like to change current behavior of
> the priority of async standby.
>
> >
> >> If we adopt this idea, in a quorum-based sync replication, I think that
> >> the priorities of all the standbys listed in synchronous_standby_names
> >> should be 1 instead of NULL. That is, those standbys have the same
> >> (highest) priority, and which means that any of them can be chosen as
> >> sync standby. Thought?
> >
> > Mainly my fault here to suggest that standbys in a quorum set should
> > have a priority set to NULL. My 2c on the matter is that I would be
> > fine with either having the async standbys having a priority of NULL
> > or using a priority of 1 for standbys in a quorum set. Though,
> > honestly, I find that showing a priority number for something where
> > this has no real meaning is even more confusing..
>
> This is just a thought but we can merge sync_priority and sync_state
> into one column. The sync priority can have meaning only when the
> standby is considered as a sync standby or a potential standby in
> priority-based sync replication. For example, we can show something
> like 'sync:N' as states of the sync standby and 'potential:N' as
> states of the potential standby in priority-based sync replication,
> where N means the priority. In quorum-based sync replication it is
> just 'quorum'. It breaks backward compatibility, though.

I'm not sure how the sync_priority is used, I know sync_state is
used to detect the state or soundness of a replication set.
Introducing varialbe part wouldn't be welcomed from such people.

The current shape of pg_stat_replication is as follows.

application_name | sync_priority | sync_state
-----------------+---------------+------------
sby1 | 1 | sync
sby3 | 2 | potential
sby3 | 2 | potential
sby2 | 3 | potential

Fot this case, the following query will work.

SELECT count(*) > 0 FROM pg_stat_replication WHERE sync_state ='sync'

Maybe a bit confusing but we can use the field to show how many
hosts are required to conform the quorum. For example the case
with s_s_names = 'ANY 3 (sby1,sby2,sby3,sby4)'.

application_name | sync_priority | sync_state
-----------------+---------------+------------
sby1 | 3 | quorum
sby4 | 3 | quorum
sby2 | 3 | quorum
sby3 | 3 | quorum
sby3 | 3 | quorum
sby5 | 0 | async

In this case, we can detect satisfaction of the quorum setup by
something like this.

SELECT count(*) >= sync_priority FROM pg_stat_replication WHERE
sync_state='quorum' GROUP BY sync_priority;

But, maybe we should provide a means to detect the standbys
really in sync with the master. This doesn't give such
information.

We could show top N standbys as priority-1 and others as
priority-2. (Of course this requires some additional
computation.)

application_name | flush_location | sync_priority | sync_state
-----------------+----------------+---------------+-----------
sby1 | 0/700140 | 1 | quorum
sby4 | 0/700100 | 1 | quorum
sby2 | 0/700080 | 1 | quorum
sby3 | 0/6FFF3e | 2 | quorum
sby3 | 0/50e345 | 2 | quorum
sby5 | 0/700140 | 0 | async

In this case, the soundness of the quorum set is checked by the
following query.

SELECT count(*) > 0 FROM pg_stat_replication WHERE sync_priority > 0;

We will find the standbys 'in sync' by the following query.

SELECT application_name FROM pg_stat_replication WHERE sync_priority = 1;

If the master doesn't have enough standbys. We could show the
state as the follows.. perhaps...

application_name | flush_location | sync_priority | sync_state
-----------------+----------------+---------------+-----------
sby1 | 0/700140 | 0 | quorum
sby4 | 0/700100 | 0 | quorum
sby5 | 0/700140 | 0 | async

Or we can use 'quorum-potential' instead of the 'quorum' above.

Or, we might be able to keep backward compatibility in a sense.

application_name | flush_location | sync_priority | sync_state
-----------------+----------------+---------------+-----------
sby1 | 0/700140 | 1 | sync
sby4 | 0/700100 | 1 | sync
sby2 | 0/700080 | 1 | sync
sby3 | 0/6FFF3e | 2 | potential
sby3 | 0/50e345 | 2 | potential
sby5 | 0/700140 | 0 | async

In the above discussion, I didn't consider possible future
exntensions of this feature.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-04-14 07:39:22 Re: Minor typo in partition.c
Previous Message Simon Riggs 2017-04-14 07:25:48 Re: Minor typo in partition.c