Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Amir Rohan <amir(dot)rohan(at)zoho(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Greg Smith <gsmith(at)gregsmith(dot)com>
Subject: Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Date: 2015-10-10 11:43:33
Message-ID: CAB7nPqQN6RK=qjzHZ2na0Zd7q6q4YQ9mEdfveRXRKv8B+Ms_Ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Fri, Oct 9, 2015 at 8:53 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Fri, Oct 9, 2015 at 8:47 PM, Amir Rohan wrote:
>> Ok, I've put myself down as reviewer in cfapp. I don't think I can
>> provide any more useful feedback that would actually result in changes
>> at this point, but I'll read through the entire discussion once last
>> time and write down final comments/notes. After that I have no problem
>> marking this for a committer to look at.
>
> OK. If you have any comments or remarks, please do not hesitate at all!

So, to let everybody know the issue, Amir has reported me offlist a
bug in one of the tests that can be reproduced more easily on a slow
machine:

> Amir wrote:
> Before posting the summary, I ran the latest v8 patch on today's git
> master (9c42727) and got some errors:
> t/004_timeline_switch.pl ...
> 1..1
> # ERROR: invalid input syntax for type pg_lsn: ""
> # LINE 1: SELECT ''::pg_lsn <= pg_last_xlog_replay_location()
> # ^
> # No tests run!

And here is my reply:
This is a timing issue and can happen when standby1, the promoted
standby which standby2 reconnects to to check that recovery works with
a timeline jump, is still in recovery after being restarted. There is
a small windows where this is possible, and this gets easier to
reproduce on slow machines (did so on a VM). So the issue was in test
004. I have updated the script to check pg_is_in_recovery() to be sure
that the node exits recovery before querying it with
pg_current_xlog_location.

It is worth noticing that the following change has saved me a lot of pain:
--- a/src/test/perl/TestLib.pm
+++ b/src/test/perl/TestLib.pm
@@ -259,6 +259,7 @@ sub psql
my ($stdout, $stderr);
print("# Running SQL command: $sql\n");
run [ 'psql', '-X', '-A', '-t', '-q', '-d', $dbname, '-f',
'-'], '<', \$sql, '>', \$stdout, '2>', \$stderr or die;
+ print "# Error output: $stderr\n" if $stderr ne "";
Perhaps we should consider backpatching it, it helped me find out the
issue I faced.

Attached is an updated patch fixing 004.
Regards,
--
Michael

Attachment Content-Type Size
20151010_recovery_regressions_v9.patch application/x-patch 30.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amir Rohan 2015-10-10 12:04:40 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
Previous Message Rajeev rastogi 2015-10-10 10:12:32 Dangling Client Backend Process