Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Date: 2022-07-01 20:14:23
Message-ID: 20220701201423.sic5b77ei3atjbes@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-06-21 19:33:01 -0700, Andres Freund wrote:
> On 2022-06-21 17:22:05 +1200, Thomas Munro wrote:
> > Problem: I saw 031_recovery_conflict.pl time out while waiting for a
> > buffer pin conflict, but so far once only, on CI:
> >
> > https://cirrus-ci.com/task/5956804860444672
> >
> > timed out waiting for match: (?^:User was holding shared buffer pin
> > for too long) at t/031_recovery_conflict.pl line 367.
> >
> > Hrmph. Still trying to reproduce that, which may be a bug in this
> > patch, a bug in the test or a pre-existing problem. Note that
> > recovery didn't say something like:
> >
> > 2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting
> > after 11.197 ms: recovery conflict on buffer pin
> >
> > (That's what I'd expect to see in
> > https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log
> > if the startup process had decided to send the signal).
> >
> > ... so it seems like the problem in that run is upstream of the interrupt stuff.
>
> Odd. The only theory I have so far is that the manual vacuum on the primary
> somehow decided to skip the page, and thus didn't trigger a conflict. Because
> clearly replay progressed past the records of the VACUUM. Perhaps we should
> use VACUUM VERBOSE? In contrast to pg_regress tests that should be
> unproblematic?

I saw a couple failures of 031 on CI for the meson patch - which obviously
doesn't change anything around this. However it adds a lot more distributions,
and the added ones run in docker containers on a shared host, presumably
adding a lot of noise. I saw this more frequently when I accidentally had the
test runs at the number of CPUs in the host, rather than the allotted CPUs in
the container.

That made me look more into these issues. I played with adding a pg_usleep()
to RecoveryConflictInterrupt() to simulate slow signal delivery.

Found a couple things:

- the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
completely swamp the target(s) on a busy system. This surely is exascerbated
by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
does seem like a bad idea.

- we process the same recovery conflict (not a single signal, but a single
"real conflict") multiple times in the target of a conflict, presumably
while handling the error. That includes handling the same interrupt once as
an ERROR and once as FATAL.

E.g.

2022-07-01 12:19:14.428 PDT [2000572] LOG: recovery still waiting after 10.032 ms: recovery conflict on buffer pin
2022-07-01 12:19:14.428 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery at character 15
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT: SELECT * FROM test_recovery_conflict_table2;
2022-07-01 12:19:54.778 PDT [2000572] LOG: recovery finished waiting after 40359.937 ms: recovery conflict on buffer pin
2022-07-01 12:19:54.778 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG: statement: SELECT 1;

note that the startup process considers the conflict resolved by the time
the backend handles the interrupt.

I also see cases where a FATAL is repeated:

2022-07-01 12:43:18.190 PDT [2054721] LOG: recovery still waiting after 15.410 ms: recovery conflict on snapshot
2022-07-01 12:43:18.190 PDT [2054721] DETAIL: Conflicting process: 2054753.
2022-07-01 12:43:18.190 PDT [2054721] CONTEXT: WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected 18 ndead 0; blkref #0: rel 1663/16385/>
2054753: reporting recovery conflict 9
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
...
2054753: reporting recovery conflict 9
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.

the FATAL one seems like it might at least partially be due to
RecoveryConflictPending not being reset in at least some of the FATAL
recovery conflict paths.

It seems pretty obvious that the proc_exit_inprogress check in
RecoveryConflictInterrupt() is misplaced, and needs to be where the errors
are thrown. But that won't help, because it turns out, we don't yet set that
necessarily.

Look at this stack from an assertion in ProcessInterrupts() ensuring that
the same FATAL isn't raised twice:

#0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1 0x00007fd47897b546 in __GI_abort () at abort.c:79
#2 0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6 "FailedAssertion",
fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259)
at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
#3 0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259
#4 0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0 <__func__.8> "internal_flush")
at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683
#5 0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393
#6 0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340
#7 0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3283
#8 0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
#9 0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0 <__func__.9> "ProcessInterrupts")
at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
#10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266
#11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:497
#12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192)

reporting a FATAL error in process of reporting a FATAL error. Yeha.

I assume this could lead to sending out the same message quite a few times.

This is quite the mess.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Finnerty, Jim 2022-07-01 20:19:48 Re: Making Vars outer-join aware
Previous Message Ibrar Ahmed 2022-07-01 20:03:22 Re: Minimal logical decoding on standbys