Inadequate failure reporting from poll_query_until

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Inadequate failure reporting from poll_query_until
Date: 2018-10-15 20:19:16
Message-ID: 17913.1539634756@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just spent some time puzzling over why we've seen so many recoveryCheck
failures in the buildfarm recently --- excluding failures with known
causes, such as the pg_verify_checksums business, that's basically all
we've seen for the last week. I am suspicious that there's some
timing-dependent problems in WAL streaming startup and shutdown, but
can't prove anything yet. However, one case catches my eye:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2018-10-15%2016%3A01%3A59

This is the sum total of the visible logged information:

#
not ok 2 - asterisk in synchronous_standby_names

# Failed test 'asterisk in synchronous_standby_names'
# at t/007_sync_rep.pl line 26.

The failing test is evidently this one:

test_sync_state(
$node_master, qq(standby1|1|sync
standby2|1|potential
standby3|1|potential),
'asterisk in synchronous_standby_names',
'*');

and when you drill down, you find that PostgresNode.pm's poll_query_until
must have timed out, and that blank line there is as much as it felt like
telling us about it. Evidently what we were getting back from the server
did not match the expected output, but good luck guessing what it was.

I think we ought to fix poll_query_until so that upon failure, you get
a much more complete and intelligible report, along the lines of

poll_query_until timed out executing this query:
$query
expecting this output:
$expected
last actual query output:
$stdout
with stderr:
$stderr

Objections? Thoughts about better formatting?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-10-15 20:23:29 Re: Inadequate failure reporting from poll_query_until
Previous Message Andres Freund 2018-10-15 20:17:56 Re: Large writable variables