Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-01-18 16:40:04
Message-ID: CAD21AoBFJa74VS+fJ_f4Mp+JzGBWwLSWZc26poMKtbUnyayKTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>> * Confirm that the data is synchronously replicated to multiple
>> standbys in same cases.
>> * case 1 : The standby which is not listed in s_s_name, is down
>> * case 2 : The standby which is listed in s_s_names but potential
>> standby, is down
>> * case 3 : The standby which is considered as sync standby, is down.
>> * Standby promotion
>>
>> In order to confirm that the commit isn't done in case #3 forever
>> unless new sync standby is up, I think we need the framework that
>> cancels executing query.
>> That is, what I'm planning is,
>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>> 2. Set up two standby servers
>> 3. Standby1 is down
>> 4. Create some contents on master (But transaction is not committed)
>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>> standby2 makes progress)
>
> This will need some thinking and is not as easy as it sounds. There is
> no way to hold on a connection after executing a query in the current
> TAP infrastructure. You are just mentioning case 3, but actually cases
> 1 and 2 are falling into the same need: if there is a failure we want
> to be able to not be stuck in the test forever and have a way to
> cancel a query execution at will. TAP uses psql -c to execute any sql
> queries, but we would need something that is far lower-level, and that
> would be basically using the perl driver for Postgres or an equivalent
> here.
>
> Honestly for those tests I just thought that we could get to something
> reliable by just looking at how each sync replication setup reflects
> in pg_stat_replication as the flow is really getting complicated,
> giving to the user a clear representation at SQL level of what is
> actually occurring in the server depending on the configuration used
> being important here.

I see.
We could check the transition of sync_state in pg_stat_replication.
I think it means that it tests for each replication method (switching
state) rather than synchronization of replication.

What I'm planning to have are,
* Confirm value of pg_stat_replication.sync_state (sync, async or potential)
* Standby promotion
* Standby catching up master
And each replication method has above tests.

Are these enough?

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-01-18 16:50:42 Re: jsonb - jsonb operators
Previous Message Andres Freund 2016-01-18 16:39:08 Re: checkpointer continuous flushing