Re: Support for N synchronous standby servers - take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for N synchronous standby servers - take 2
Date: 2016-01-20 06:05:01
Message-ID: CAD21AoCtpS6Pz4BqjnTp_hhp6ioVCzp0_7Db9HVH=JKtRAYDMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 19, 2016 at 2:55 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Jan 19, 2016 at 1:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>>>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>>>> * Confirm that the data is synchronously replicated to multiple
>>>> standbys in same cases.
>>>> * case 1 : The standby which is not listed in s_s_name, is down
>>>> * case 2 : The standby which is listed in s_s_names but potential
>>>> standby, is down
>>>> * case 3 : The standby which is considered as sync standby, is down.
>>>> * Standby promotion
>>>>
>>>> In order to confirm that the commit isn't done in case #3 forever
>>>> unless new sync standby is up, I think we need the framework that
>>>> cancels executing query.
>>>> That is, what I'm planning is,
>>>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>>>> 2. Set up two standby servers
>>>> 3. Standby1 is down
>>>> 4. Create some contents on master (But transaction is not committed)
>>>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>>>> standby2 makes progress)
>>>
>>> This will need some thinking and is not as easy as it sounds. There is
>>> no way to hold on a connection after executing a query in the current
>>> TAP infrastructure. You are just mentioning case 3, but actually cases
>>> 1 and 2 are falling into the same need: if there is a failure we want
>>> to be able to not be stuck in the test forever and have a way to
>>> cancel a query execution at will. TAP uses psql -c to execute any sql
>>> queries, but we would need something that is far lower-level, and that
>>> would be basically using the perl driver for Postgres or an equivalent
>>> here.
>>>
>>> Honestly for those tests I just thought that we could get to something
>>> reliable by just looking at how each sync replication setup reflects
>>> in pg_stat_replication as the flow is really getting complicated,
>>> giving to the user a clear representation at SQL level of what is
>>> actually occurring in the server depending on the configuration used
>>> being important here.
>>
>> I see.
>> We could check the transition of sync_state in pg_stat_replication.
>> I think it means that it tests for each replication method (switching
>> state) rather than synchronization of replication.
>>
>> What I'm planning to have are,
>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>> * Standby promotion
>> * Standby catching up master
>> And each replication method has above tests.
>>
>> Are these enough?
>
> Does promoting the standby and checking that it caught really have
> value in this context of this patch? What we just want to know is on a
> master, which nodes need to be waited for when s_s_names or any other
> method is used, no?

Yeah, these 2 tests are not in this context of this patch.
If test framework could have the facility that allows us to execute
query(psql) as another process, we could use pg_cancel_backend()
function to waiting process when master server waiting for standbys.
In order to check whether the master server would wait for the standby
or not, we need test framework to have such facility, I think.

Regards,

--
Masahiko Sawada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-01-20 06:05:14 Re: Window2012R2: initdb error: "The current directory is invalid."
Previous Message Craig Ringer 2016-01-20 05:54:10 Re: Re: pglogical_output - a general purpose logical decoding output plugin