Re: pgsql: Add regression test for recovery pause.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <fujii(at)postgresql(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Add regression test for recovery pause.
Date: 2021-06-02 21:26:45
Message-ID: 191471.1622669205@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Fujii Masao <fujii(at)postgresql(dot)org> writes:
> Add regression test for recovery pause.

Buildfarm member jacana doesn't like this patch:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2021-06-02%2012%3A00%3A44

the symptom being

Jun 02 09:05:17 t/005_replay_delay..................# poll_query_until timed out executing this query:
Jun 02 09:05:17 # SELECT '0/3002A20'::pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 # expecting this output:
Jun 02 09:05:17 # t
Jun 02 09:05:17 # last actual query output:
Jun 02 09:05:17 #
Jun 02 09:05:17 # with stderr:
Jun 02 09:05:17 # ERROR: syntax error at or near "pg_lsn"
Jun 02 09:05:17 # LINE 1: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()
Jun 02 09:05:17 # ^

Checking the postmaster log confirms that what the backend is getting is

2021-06-02 08:58:01.073 EDT [60b78059.f84:4] 005_replay_delay.pl ERROR: syntax error at or near "pg_lsn" at character 20
2021-06-02 08:58:01.073 EDT [60b78059.f84:5] 005_replay_delay.pl STATEMENT: SELECT '0\\3002A20';pg_lsn < pg_last_wal_receive_lsn()

It sort of looks like something has decided that the pg_lsn constant
is a search path and made a lame attempt to convert it to Windows
style. I doubt our own code is doing that, so I'm inclined to blame
IPC::Run thinking it can mangle the command string it's given.
I wonder whether jacana has got a freshly-installed version of IPC::Run.

Another interesting question is how come we managed to get this far
in the tests. There is a nearly, but not quite, identical delay
query in 002_archiving.pl, which already ran successfully:

# Wait until necessary replay has been done on standby
my $caughtup_query =
"SELECT '$current_lsn'::pg_lsn <= pg_last_wal_replay_lsn()";
$node_standby->poll_query_until('postgres', $caughtup_query)
or die "Timed out while waiting for standby to catch up";

I wonder whether the fact that 002 uses '<=' not '<' could be
at all related. (I also wonder which one is correct as a means
of waiting for replay; they are not both correct.)

In any case, letting IPC::Run munge SQL commands seems completely
unacceptable. We can't plan on working around that every time.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andrew Dunstan 2021-06-02 22:25:40 Re: pgsql: Add regression test for recovery pause.
Previous Message Tom Lane 2021-06-02 18:38:30 pgsql: Fix planner's row-mark code for inheritance from a foreign table

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-06-02 21:27:55 Re: improve installation short version
Previous Message Greg Sabino Mullane 2021-06-02 21:09:36 Re: Speed up pg_checksums in cases where checksum already set