Quick Links

Re: Unstable tests for recovery conflict handling

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Unstable tests for recovery conflict handling
Date:	2022-07-26 17:57:53
Message-ID:	2454340.1658858273@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers pgsql-hackers

I wrote:
>> It's been kind of hidden by other buildfarm noise, but
>> 031_recovery_conflict.pl is not as stable as it should be [1][2][3][4].

> After digging around in the code, I think this is almost certainly
> some manifestation of the previously-complained-of problem [1] that
> RecoveryConflictInterrupt is not safe to call in a signal handler,
> leading the conflicting backend to sometimes decide that it's not
> the problem.

I happened to notice that while skink continues to fail off-and-on
in 031_recovery_conflict.pl, the symptoms have changed! What
we're getting now typically looks like [1]:

[10:45:11.475](0.023s) ok 14 - startup deadlock: lock acquisition is waiting
Waiting for replication conn standby's replay_lsn to pass 0/33FB8B0 on primary
done
timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl line 367.

where absolutely nothing happens in the standby log, until we time out:

2022-07-24 10:45:11.452 UTC [1468367][client backend][2/4:0] LOG: statement: SELECT * FROM test_recovery_conflict_table2;
2022-07-24 10:45:11.472 UTC [1468547][client backend][3/2:0] LOG: statement: SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
2022-07-24 10:48:15.860 UTC [1468362][walreceiver][:0] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly

So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
the startup process hasn't detected the buffer conflict in the first
place. Don't know what to make of that, but I vaguely suspect a test
timing problem. gull has shown this once as well, although at a different
step in the script [2].

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-07-24%2007%3A00%3A29
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&dt=2022-07-23%2009%3A34%3A54

In response to

Re: Unstable tests for recovery conflict handling at 2022-04-27 18:08:45 from Tom Lane

Responses

Re: Unstable tests for recovery conflict handling at 2022-07-26 18:16:11 from Andres Freund

Browse pgsql-committers by date

	From	Date	Subject
Next Message	Andres Freund	2022-07-26 18:16:11	Re: Unstable tests for recovery conflict handling
Previous Message	Tom Lane	2022-07-26 17:07:23	pgsql: Force immediate commit after CREATE DATABASE etc in extended pro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	vignesh C	2022-07-26 18:04:50	Re: Handle infinite recursion in logical replication setup
Previous Message	vignesh C	2022-07-26 17:57:40	Re: Handle infinite recursion in logical replication setup