Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-04-04 13:00:24
Message-ID: CAD21AoDoq1ubY4KkKhrA9jzaVXekwAT7gV5pQJbS+wj98b9-3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 4, 2016 at 6:03 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, thank you for testing.
>
> At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=2sdDL2hs3XbWb5FORegNHBObLJ-8C2=aaeG-riZTd2Rw(at)mail(dot)gmail(dot)com>
>> >>> Attached latest patch incorporate some review comments so far, and is
>> >>> rebased against current HEAD.
>> >>>
>> >>
>> >> Sorry I attached wrong patch.
>> >> Attached patch is correct patch.
>> >>
>> >> [mulit_sync_replication_v21.patch]
>> >
>> > Here are some TPS numbers from some quick tests I ran on a set of
>> > Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
>> > as primary + 3 standbys, to try out different combinations of
>> > synchronous_commit levels and synchronous_standby_names numbers. They
>> > were run for a short time only and these are of course systems with
>> > limited and perhaps uneven IO and CPU, but they still give some idea
>> > of the trends. And reassuringly, the trends are travelling in the
>> > expected directions.
>> >
>> > All default settings except shared_buffers = 1GB, and the GUCs
>> > required for replication.
>> >
>> > pgbench postgres -j2 -c2 -N bench2 -T 600
>> >
>> > 1(*) 2(*) 3(*)
>> > ==== ==== ====
>> > off = 4056 4096 4092
>> > local = 1323 1299 1312
>> > remote_write = 1130 1046 958
>> > on = 860 744 701
>> > remote_apply = 785 725 604
>> >
>> > pgbench postgres -j16 -c16 -N bench2 -T 600
>> >
>> > 1(*) 2(*) 3(*)
>> > ==== ==== ====
>> > off = 3952 3943 3933
>> > local = 2964 2984 3026
>> > remote_write = 2790 2724 2675
>> > on = 2731 2627 2523
>> > remote_apply = 2627 2501 2432
>> >
>> > One thing I noticed is that there are LOG messages telling me when a
>> > standby becomes a synchronous standby, but nothing to tell me if a
>> > standby stops being a standby (ie because a higher priority one has
>> > taken its place in the quorum). Would that be interesting?
>
> A walsender exits by proc_exit() for any operational
> termination so wrapping proc_exit() should work. (Attached file 1)
>
> For the setting "2(Sby1, Sby2, Sby3)", the master says that all
> of the standbys are sync-standbys and no message is emited on
> failure of Sby1, which should cause a promotion of Sby3.
>
>> standby "Sby3" is now the synchronous standby with priority 3
>> standby "Sby2" is now the synchronous standby with priority 2
>> standby "Sby1" is now the synchronous standby with priority 1
> ..<Sby 1 failure>
>> standby "Sby3" is now the synchronous standby with priority 3
>
> Sby3 becomes sync standby twice:p
>
> This was a behavior taken over from the single-sync-rep era but
> it should be confusing for the new sync-rep selection mechanism.
> The second attached diff makes this as the following.
>
>
>> 17:48:21.969 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:48:23.087 LOG: standby "Sby2" is now a synchronous standby with priority 2
>> 17:48:25.617 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:48:31.990 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>> 17:48:43.905 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:49:10.262 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:49:13.865 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>
> Since this status check is taken place for every reply from
> stanbys, the message of downgrading to "potential" may be
> diferred or even fail to occur but it should be no problem.
>
> Applying the both of the above patches, the message would be like
> the following.
>
>> 17:54:08.367 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:54:08.564 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:54:08.565 LOG: standby "Sby2" is now a synchronous standby with priority 2
>> 17:54:18.387 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>> 17:54:28.887 LOG: synchronous standby "Sby1" with priority 1 exited
>> 17:54:31.359 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:54:39.008 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:54:41.382 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>
> Does this make sense?
>
> By the way, Sawada-san, you have changed the parentheses for the
> priority method from '[]' to '()'. And I mistankenly defined
> s_s_names as '2[Sby1, Sby2, Sby3]' and got wrong behavior, that
> is, only Sby2 is registed as mandatory synchronous standby.
>
> For this case, the tree members of SyncRepConfig are '2[Sby1,',
> 'Sby2', "Sby3]'. This syntax is valid for the current
> specification but will surely get different meaning by the future
> changes. We should refuse this known-to-be-wrong-in-future syntax
> from now.
>

I have no objection about current version patch.
But one optimise idea I came up with is to return false before
calculation of lowest LSN from sync standby if MyWalSnd is not listed
in sync_standby.
For example in SyncRepGetOldestSyncRecPtr(),

==
sync_standby = SyncRepGetSyncStandbys();

if (list_length(sync_standbys) <SyncRepConfig->num_sync()
{
(snip)
}

/* Here if MyWalSnd is not listed in sync_standby, quick exit. */
if (list_member_int(sync_standbys, MyWalSnd->slotno))
return false;

foreach(cell, sync_standbys)
{
(snip)
}
==

> For this case, the tree members of SyncRepConfig are '2[Sby1,',
> 'Sby2', "Sby3]'. This syntax is valid for the current
> specification but will surely get different meaning by the future
> changes. We should refuse this known-to-be-wrong-in-future syntax
> from now.

I couldn't get your point but why will the above syntax meaning be
different from current meaning by future change?
I thought that another method uses another kind of parentheses.

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-04-04 13:03:55 Re: Support for N synchronous standby servers - take 2
Previous Message Aleksander Alekseev 2016-04-04 12:56:56 Tiny patch: sigmask.diff