SyncRepGetSyncStandbysPriority() vs. SIGHUP

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: SyncRepGetSyncStandbysPriority() vs. SIGHUP
Date: 2020-02-06 07:45:52
Message-ID: 20200206074552.GB3326097@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Buildfarm runs have triggered the assertion at the end of
SyncRepGetSyncStandbysPriority():

sysname │ snapshot │ branch │ bfurl
──────────┼─────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────
hoverfly │ 2019-11-22 12:15:08 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2019-11-22%2012%3A15%3A08
hoverfly │ 2019-11-07 17:19:12 │ REL9_6_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2019-11-07%2017%3A19%3A12
nightjar │ 2019-08-13 23:04:41 │ REL_10_STABLE │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2019-08-13%2023%3A04%3A41
skink │ 2018-11-28 21:03:35 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-28%2021%3A03%3A35

On my development system, this delay injection reproduces the failure:

--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -399,6 +399,8 @@ SyncRepInitConfig(void)
{
int priority;

+ pg_usleep(100 * 1000);

SyncRepInitConfig() is the function responsible for updating, after SIGHUP,
the sync_standby_priority values that SyncRepGetSyncStandbysPriority()
consults. The assertion holds if each walsender's sync_standby_priority (in
shared memory) accounts for the latest synchronous_standby_names GUC value.
That ceases to hold for brief moments after a SIGHUP that changes the
synchronous_standby_names GUC value.

I think the way to fix this is to nominate one process to update all
sync_standby_priority values after SIGHUP. That process should acquire
SyncRepLock once per ProcessConfigFile(), not once per walsender. If
walsender startup occurs at roughly the same time as a SIGHUP, the new
walsender should avoid computing sync_standby_priority based on a GUC value
different from the one used for the older walsenders.

Would anyone like to fix this? I could add it to my queue, but it would wait
a year or more.

Thanks,
nm

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-02-06 07:52:48 Re: Identifying user-created objects
Previous Message Michael Paquier 2020-02-06 07:31:51 Re: Identifying user-created objects