Re: Standalone synchronous master

From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Rajeev rastogi" <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 07:04:50
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB7713DDB8FB0@SZXEML508-MBX.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 13th January 2013, Josh Berkus Wrote:

> I'm leading this off with a review of the features offered by the
> actual patch submitted. My general discussion of the issues of Sync
> Degrade, which justifies my specific suggestions below, follows that.
> Rajeev, please be aware that other hackers may have different opinions
> than me on what needs to change about the patch, so you should collect
> all opinions before changing code.

Thanks for reviewing and providing the first level of comments. Surely
We'll collect all feedback to improve this patch.

>
> > Add a new parameter :
>
> > synchronous_standalone_master = on | off
>
> I think this is a TERRIBLE name for any such parameter. What does
> "synchronous standalone" even mean? A better name for the parameter
> would be "auto_degrade_sync_replication" or "synchronous_timeout_action
> = error | degrade", or something similar. It would be even better for
> this to be a mode of synchronous_commit, except that synchronous_commit
> is heavily overloaded already.

Yes we can change this parameter name. Some of the suggestion in order to degrade the mode
1. Auto-degrade using some sort of configuration parameter as done in current patch.
2. Expose the configuration variable to a new SQL-callable functions as suggested by Heikki.
3. Or using ALTER SYSTEM SET as suggested by others.

> Some issues raised by this log script:
>
> LOG: standby "tx0113" is now the synchronous standby with priority 1
> LOG: waiting for standby synchronization
> <-- standby wal receiver on the standby is killed (SIGKILL)
> LOG: unexpected EOF on standby connection
> LOG: not waiting for standby synchronization
> <-- restart standby so that it connects again
> LOG: standby "tx0113" is now the synchronous standby with priority 1
> LOG: waiting for standby synchronization
> <-- standby wal receiver is first stopped (SIGSTOP) to make sure
>
> The "not waiting for standby synchronization" message should be marked
> something stronger than LOG. I'd like ERROR.

Yes we can change this to ERROR.

> Second, you have the master resuming sync rep when the standby
> reconnects. How do you determine when it's safe to do that? You're
> making the assumption that you have a failing sync standby instead of
> one which simply can't keep up with the master, or a flakey network
> connection (see discussion below).

Yes this can be further improved so that only if we make sure that synchronous
Standby has caught up with master node (may require a better design), then only
master can be upgraded to Synchronous mode by one of the method discussed above.

> > a. Master_to_standalone_cmd: To be executed before master
> switches to standalone mode.
> >
> > b. Master_to_sync_cmd: To be executed before master switches
> from
> sync mode to standalone mode.
>
> I'm not at all clear what the difference between these two commands is.
> When would one be excuted, and when would the other be executed? Also,
> renaming ...

There is typo mistake in above explain, meaning of two commands are:
a. Master_to_standalone_cmd: To be executed during degradation of sync mode.

b. Master_to_sync_cmd: To be executed before upgrade or restoration of mode.

These two commands are per the TODO item to inform DBA.

But as per Heikki suggestion, we should not use this mechanism to inform DBA rather
We should some have some sort of generic trap system, instead of adding this one
particular extra config option specifically for this feature.
This looks to be better idea so we can have further discussion to come with proper
design.

> Missing features:
>
> a) we should at least send committing clients a WARNING if they have
> commited a synchronous transaction and we are in degraded mode.

Yes it is great idea.

> One of the reasons that there's so much disagreement about this feature
> is that most of the folks strongly in favor of auto-degrade are
> thinking
> *only* of the case that the standby is completely down. There are many
> other reasons for a sync transaction to hang, and the walsender has
> absolutely no way of knowing which is the case. For example:
>
> * Transient network issues
> * Standby can't keep up with master
> * Postgres bug
> * Storage/IO issues (think EBS)
> * Standby is restarting
>
> You don't want to handle all of those issues the same way as far as
> sync rep is concerned. For example, if the standby is restaring, you
> probably want to wait instead of degrading.

I think if we support to have some external SQL-callable functions as Heikki
suggested to degrade instead of auto-degrade then user can handle at-least some
of the above scenarios if not all based on their experience and observation.

Thanks and Regards,
Kumar Rajeev Rastogi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oskari Saarenmaa 2014-01-13 07:24:38 Re: pgcrypto: implement gen_random_uuid
Previous Message Amit Kapila 2014-01-13 06:28:20 Re: Standalone synchronous master