Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2015-12-24 19:50:16
Message-ID: CAD21AoB_RYHpg5h_W1H9P9u+DRicDxhRGOAdybdKhf_hBuGMzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Fri, Dec 18, 2015 at 7:38 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> [000-_multi_sync_replication_v3.patch]
>>
>> Hi Masahiko,
>>
>> I haven't tested this version of the patch but I have some comments on the code.
>>
>> +/* Is this wal sender considerable one? */
>> +bool
>> +SyncRepActiveListedWalSender(int num)
>>
>> Maybe "Is this wal sender managing a standby that is streaming and
>> listed as a synchronous standby?"

Fixed.

>> +/*
>> + * Obtain three palloc'd arrays containing position of standbys currently
>> + * considered as synchronous, and its length.
>> + */
>> +int
>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>
>> This comment seems to be out of date. I would say "Populate a
>> caller-supplied array which much have enough space for ... Returns
>> ...".

Fixed.

>> +/*
>> + * Obtain standby currently considered as synchronous using
>> + * '1-priority' method.
>> + */
>> +int
>> +SyncRepGetSyncStandbysOnePriority(int *sync_standbys)
>> + ... code ...
>>
>> Why do we need a separate function and code path for this case? If
>> you used SyncRepGetSyncStandbysPriority with a size of 1, should it
>> not produce the same result in the same time complexity?

I was thinking that we could add new function like
SyncRepGetSyncStandbysXXXXX function (XXXXX is replication method
name) if we want to expand the kind of repliaction method.
So I include replication method name into function name.
But it's enough to add one function for 2 replication method;
priority, 1-priority

>> +/*
>> + * Obtain standby currently considered as synchronous using
>> + * 'priority' method.
>> + */
>> +int
>> +SyncRepGetSyncStandbysPriority(int *sync_standbys)
>>
>> I would say something more descriptive, maybe like this: "Populates a
>> caller-supplied buffer with the walsnds indexes of the highest
>> priority active synchronous standbys, up to the a limit of
>> 'synchronous_standby_num'. The order of the results is undefined.
>> Returns the number of results actually written."

Fixed.

>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>> I think it would be a tiny bit nicer if it also took a Size n argument
>> along with the output buffer pointer.

Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
function uses synchronous_standby_num which is global variable.
But you mean that the number of synchronous standbys is given as
function argument?

>> As for the body of that function (which I won't paste here), it
>> contains an algorithm to find the top K elements in an array of N
>> elements. It does that with a linear search through the top K seen so
>> far for each value in the input array, so its worst case is O(KN)
>> comparisons. Some of the sorting gurus on this list might have
>> something to say about that but my take is that it seems fine for the
>> tiny values of K and N that we're dealing with here, and it's nice
>> that it doesn't need any space other than the output buffer, unlike
>> some other top-K algorithms which would win for larger inputs.

Yeah, it's improvement point.
But I'm assumed that the number of synchronous replication is not
large, so I use this algorithm as first version.
And I think that its worst case is O(K(N-K)). Am I missing something?

>> + /* Found sync standby */
>>
>> This comment would be clearer as "Found lowest priority standby, so replace it".

Fixed.

>> + if (walsndloc->sync_standby_priority == priority &&
>> + walsnd->sync_standby_priority < priority)
>> + sync_standbys[j] = i;
>>
>> In this case, couldn't you also update 'priority' directly, and break
>> out of the loop immediately?
>
> Oops, I didn't think that though: you can't break from the loop, you
> still need to find the new lowest priority, so I retract that bit.
>
>> Wouldn't "lowest_priority" be a better
>> variable name than "priority"? It might be good to say "lowest"
>> rather than "highest" in the nearby comments, to be consistent with
>> other parts of the code including the function name (lower priority
>> number means higher priority!).
>>
>> +/*
>> + * Obtain currently synced LSN: write and flush,
>> + * using '1-prioirty' method.
>>
>> s/prioirty/priority/
>>
>> + */
>> +bool
>> +SyncRepGetSyncLsnsOnePriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>>
>> Similar to the earlier case, why have a special case for 1-priority?
>> Wouldn't SyncRepGetSyncLsnsPriority produce the same result when is
>> synchronous_standby_num == 1?
>>
>> +/*
>> + * Obtain currently synced LSN: write and flush,
>> + * using 'prioirty' method.
>>
>> s/prioirty/priority/
>>
>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>> +{
>> + int *sync_standbys = NULL;
>> + int num_sync;
>> + int i;
>> + XLogRecPtr synced_write = InvalidXLogRecPtr;
>> + XLogRecPtr synced_flush = InvalidXLogRecPtr;
>> +
>> + sync_standbys = (int *) palloc(sizeof(int) * synchronous_standby_num);
>>
>> Would a fixed size buffer on the stack (of compile time constant size)
>> be better than palloc/free in here and elsewhere?
>>
>> + /*
>> + for (i = 0; i < num_sync; i++)
>> + {
>> + volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]];
>> + if (walsndloc == MyWalSnd)
>> + {
>> + found = true;
>> + break;
>> + }
>> + }
>> + */
>>
>> Dead code.
>>
>> + if (synchronous_replication_method == SYNC_REP_METHOD_1_PRIORITY)
>> + synchronous_standby_num = 1;
>> + else
>> + synchronous_standby_num = pg_atoi(lfirst(list_head(elemlist)),
>> sizeof(int), 0);

Fixed.

>> Should we detect if synchronous_standby_num > the number of listed
>> servers, which would be a nonsensical configuration? Should we also
>> impose some other kind of constant limits, like must be >= 0 (I
>> haven't tried but I wonder if -1 leads to very large palloc) and must
>> be <= MAX_XXX (smallish sanity check number like 256, rather than the
>> INT_MAX limit imposed by pg_atoi), so that we could use that constant
>> to size stack buffers in the places where you currently palloc?

Yeah, I add validation check for s_s_num.

>> Could 1-priority mode be inferred from the use of a non-number in the
>> leading position, and if so, does the mode concept even need to exist,
>> especially if SyncRepGetSyncLsnsOnePriority and
>> SyncRepGetSyncStandbysOnePriority aren't really needed either way? Is
>> there any difference in behaviour between the following
>> configurations? (Sorry if that particular question has already been
>> duked out in the long thread about GUCs.)
>>
>> synchronous_replication_method = 1-priority
>> synchronous_standby_names = foo, bar
>>
>> synchronous_replication_method = priority
>> synchronous_standby_names = 1, foo, bar

The behaviour under the both configuration are the same.
I added '1-priority' method for backward compatibility. The default
value of s_r_method is '1-priority', so user who is using sync
replicatoin can continues to use after upgrading smoothly.

>> (Apologies for the missing leading whitespace in patch fragments
>> pasted above, it seems that my mail client has eaten it).

No problem. Thank you for reviewing!

> I have moved this entry to next CF as review is quite recent.
Thanks!

Attached latest version patch.
Please review it.

Regards,

--
Masahiko Sawada

Attachment Content-Type Size
000_multi_sync_replication_v4.patch application/octet-stream 17.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2015-12-24 20:11:00 ALTER ROLE SET/RESET for multiple options
Previous Message Alexander Korotkov 2015-12-24 18:42:10 Re: Commit fest status for 2015-11