Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Date: 2022-07-01 23:18:33
Message-ID: 20220701231833.vh76xkf3udani3np@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HHi,

On 2022-07-01 13:14:23 -0700, Andres Freund wrote:
> I saw a couple failures of 031 on CI for the meson patch - which obviously
> doesn't change anything around this. However it adds a lot more distributions,
> and the added ones run in docker containers on a shared host, presumably
> adding a lot of noise. I saw this more frequently when I accidentally had the
> test runs at the number of CPUs in the host, rather than the allotted CPUs in
> the container.
>
> That made me look more into these issues. I played with adding a pg_usleep()
> to RecoveryConflictInterrupt() to simulate slow signal delivery.
>
> Found a couple things:
>
> - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can
> completely swamp the target(s) on a busy system. This surely is exascerbated
> by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace
> does seem like a bad idea.

This one is also implicated in
https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql
and related issues.

Besides being very short, it also seems like a bad idea to wait when we might
not need to? Seems we should only wait if we subsequently couldn't get the
lock?

Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at
least I wouldn't expect given its name.

A minimal fix would be to increase the wait time, similar how it is done with
standbyWait_us?

Medium term it seems we ought to set the startup process's latch when handling
a conflict, and use a latch wait. But avoiding races probably isn't quite
trivial.

> - we process the same recovery conflict (not a single signal, but a single
> "real conflict") multiple times in the target of a conflict, presumably
> while handling the error. That includes handling the same interrupt once as
> an ERROR and once as FATAL.
>
> E.g.
>
> 2022-07-01 12:19:14.428 PDT [2000572] LOG: recovery still waiting after 10.032 ms: recovery conflict on buffer pin
> 2022-07-01 12:19:14.428 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery at character 15
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT: SELECT * FROM test_recovery_conflict_table2;
> 2022-07-01 12:19:54.778 PDT [2000572] LOG: recovery finished waiting after 40359.937 ms: recovery conflict on buffer pin
> 2022-07-01 12:19:54.778 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead 100; blkref #0: rel 1663/16385/1>
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery.
> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
> 2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG: statement: SELECT 1;
>
> note that the startup process considers the conflict resolved by the time
> the backend handles the interrupt.

I guess the reason we first get an ERROR and then a FATAL is that the second
iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit,
because we end up there after handling the first error? And that's a FATAL.

I suspect that Thomas' fix will address at least part of this, as the check
whether we're still waiting for a lock will be made just before the error is
thrown.

> I also see cases where a FATAL is repeated:
>
> 2022-07-01 12:43:18.190 PDT [2054721] LOG: recovery still waiting after 15.410 ms: recovery conflict on snapshot
> 2022-07-01 12:43:18.190 PDT [2054721] DETAIL: Conflicting process: 2054753.
> 2022-07-01 12:43:18.190 PDT [2054721] CONTEXT: WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected 18 ndead 0; blkref #0: rel 1663/16385/>
> 2054753: reporting recovery conflict 9
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
> 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
> ...
> 2054753: reporting recovery conflict 9
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versions that must be removed.
> 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
> the FATAL one seems like it might at least partially be due to
> RecoveryConflictPending not being reset in at least some of the FATAL
> recovery conflict paths.
>
> It seems pretty obvious that the proc_exit_inprogress check in
> RecoveryConflictInterrupt() is misplaced, and needs to be where the errors
> are thrown. But that won't help, because it turns out, we don't yet set that
> necessarily.
>
> Look at this stack from an assertion in ProcessInterrupts() ensuring that
> the same FATAL isn't raised twice:
>
> #0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
> #1 0x00007fd47897b546 in __GI_abort () at abort.c:79
> #2 0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6 "FailedAssertion",
> fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259)
> at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69
> #3 0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259
> #4 0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0 <__func__.8> "internal_flush")
> at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683
> #5 0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393
> #6 0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340
> #7 0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3283
> #8 0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541
> #9 0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0 <__func__.9> "ProcessInterrupts")
> at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592
> #10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266
> #11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:497
> #12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192)
>
> reporting a FATAL error in process of reporting a FATAL error. Yeha.
>
> I assume this could lead to sending out the same message quite a few
> times.

This seems like it needs to be fixed in elog.c. ISTM that at the very least we
ought to HOLD_INTERRUPTS() before the EmitErrorReport() for FATAL.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-01 23:19:11 Re: should check interrupts in BuildRelationExtStatistics ?
Previous Message Justin Pryzby 2022-07-01 23:14:13 pg15b2: large objects lost on upgrade