| From: | Robert Pang <robertpang(at)google(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | [PATCH] Fix premature timeout in pg_promote() caused by signal interruptions |
| Date: | 2026-03-11 16:44:07 |
| Message-ID: | CAJhEC07OK8J7tLUbyiccnuOXRE7UKxBNqD2-pLfeFXa=tBoWtw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi all,
We have observed an issue where pg_promote() returns false and issues
a timeout warning prematurely, even if the standby server is
successfully promoted later within the specified timeout period.
Problem Description
The current implementation of pg_promote() calculates a fixed number
of loop iterations based on the timeout value, assuming each loop
waits exactly 100 ms for the backend latch. However, if the backend
receives an unrelated signal (e.g., from
client_connection_check_interval), it wakes up early. These repeated,
unrelated wakeups cause the loop counter to deplete much faster than
intended, leading to a premature timeout.
Reproduction
Set up a standby server while modifying pg_promote not to write the
promote file to block the promotion. And by setting
client_connection_check_interval = 1, we can consistently trigger a
premature timeout. In the example below, a 10-second timeout expires
in roughly 107 ms:
postgres=# set client_connection_check_interval=1;
SET
postgres=# \timing
Timing is on.
postgres=# select pg_promote(true, 10);
WARNING: server did not promote within 10 seconds
┌────────────┐
│ pg_promote │
├────────────┤
│ f │
└────────────┘
(1 row)
Time: 107.783 ms
Proposed Fix
The attached patch modifies the logic to loop based on the actual
elapsed time rather than a fixed number of iterations. This ensures
that pg_promote() respects the specified timeout regardless of how
many times the backend latch is signaled.
After applying the patch, the timeout behaves as expected:
postgres=# set client_connection_check_interval=1;
SET
postgres=# \timing
Timing is on.
postgres=# select pg_promote(true, 10);
WARNING: server did not promote within 10 seconds
┌────────────┐
│ pg_promote │
├────────────┤
│ f │
└────────────┘
(1 row)
Time: 10000.865 ms (00:10.001)
We would like to submit this patch for the community's consideration.
Best regards
Robert Pang
Google
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Fix-premature-timeout-in-pg_promote-caused-by-signal.patch | application/x-patch | 1.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-03-11 16:45:55 | Re: Defend against -ffast-math in meson builds |
| Previous Message | Greg Sabino Mullane | 2026-03-11 16:39:19 | Re: ALTER TABLE: warn when actions do not recurse to partitions |