Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Fujii Masao <fujii(at)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add regression tests for multiple synchronous standbys.
Date: 2016-04-14 04:22:38
Message-ID: CAB7nPqT6GGLwPzyPOR7BYVjYUkD=S2bA2QT6XhWB1WixWy+7BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Thu, Apr 14, 2016 at 11:29 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 13, 2016 at 4:54 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Fri, Apr 8, 2016 at 4:49 PM, Fujii Masao <fujii(at)postgresql(dot)org> wrote:
>>> Add regression tests for multiple synchronous standbys.
>>>
>>> Authors: Suraj Kharage, Michael Paquier, Masahiko Sawada, refactored by me
>>> Reviewed-By: Kyotaro Horiguchi
>>
>> Well, we are not quite there yet:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-04-12%2016%3A00%3A06
>>
>> # Running: pg_ctl -D
>> /home/buildfarm/data/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/data_master_Qmuz/pgdata
>> reload
>> server signaled
>> not ok 2 - asterisk in synchronous_standby_names
>>
>> # Failed test 'asterisk in synchronous_standby_names'
>> # at t/007_sync_rep.pl line 26.
>> # got: 'standby1|1|sync
>> # standby2|1|potential
>> # standby3|0|async'
>> # expected: 'standby1|1|sync
>> # standby2|1|potential
>> # standby3|1|potential'
>
> This seems to be a timing issue.
>
> There can be small window after SIGHUP is sent before walsender updates
> its priority based on new s_s_names. If pg_stat_replication is checked
> before that update, it displays unexpected output. Probably we need to
> sleep a few second after pg_ctl reload before pg_stat_replication.

Yes. I'd prefer avoid a hardcoded sleep and have something that relies
on lookups of pg_stat_replication though, because there is no way to
be sure that a sleep will ever be stable on heavily loaded machines,
like the machines I am specialized in maintaining :)
It kills a bit the purpose on having checks on pg_stat_replication as
the validation tests are based on that, still I think that we had
better base those checks on a loop that has a timeout, something like
that in a subroutine:
test_passed = 0;
while ($timeout < 30)
{
$result = $node->psql('SELECT blah FROM pg_stat_replication');
if ($result eq $expected)
{
test_passed = 1;
break;
}
sleep 1;
$timeout++;
}
ok($test_passed, $test_name);

What do you think?
--
Michael

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message David Rowley 2016-04-14 04:48:45 Re: pgsql: Provide errno-translation wrappers around bind() and listen() on
Previous Message Tom Lane 2016-04-14 03:33:43 pgsql: Fix broken dependency-mongering for index operator classes/famil

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-04-14 04:24:34 Re: Support for N synchronous standby servers - take 2
Previous Message Kyotaro HORIGUCHI 2016-04-14 04:11:10 Re: Support for N synchronous standby servers - take 2