Re: SyncRepGetSyncStandbysPriority() vs. SIGHUP

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: noah(at)leadboat(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SyncRepGetSyncStandbysPriority() vs. SIGHUP
Date: 2020-02-07 03:52:51
Message-ID: 20200207.125251.146972241588695685.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 5 Feb 2020 23:45:52 -0800, Noah Misch <noah(at)leadboat(dot)com> wrote in
> Buildfarm runs have triggered the assertion at the end of
> SyncRepGetSyncStandbysPriority():
..
> On my development system, this delay injection reproduces the failure:
>
> --- a/src/backend/replication/syncrep.c
> +++ b/src/backend/replication/syncrep.c
> @@ -399,6 +399,8 @@ SyncRepInitConfig(void)
> {
> int priority;
>
> + pg_usleep(100 * 1000);

Though I couldn't see that, but actually that can happen.

That happens if:

all potentially-sync standbys have lower priority than the number of
syncrep list members.

the number of sync standbys is short.

the number of the potentially-sync standbys is enough.

If all of the above are true, the while (priority <-= lowest_priority)
doesn't loops and goes into the assertion.

> SyncRepInitConfig() is the function responsible for updating, after SIGHUP,
> the sync_standby_priority values that SyncRepGetSyncStandbysPriority()
> consults. The assertion holds if each walsender's sync_standby_priority (in
> shared memory) accounts for the latest synchronous_standby_names GUC value.
> That ceases to hold for brief moments after a SIGHUP that changes the
> synchronous_standby_names GUC value.

Agreed.

> I think the way to fix this is to nominate one process to update all
> sync_standby_priority values after SIGHUP. That process should acquire
> SyncRepLock once per ProcessConfigFile(), not once per walsender. If
> walsender startup occurs at roughly the same time as a SIGHUP, the new
> walsender should avoid computing sync_standby_priority based on a GUC value
> different from the one used for the older walsenders.

If we update the priority of all walsenders at once, the other
walsenders may calculate required LSN using the old configuration with
the new priority. I'm not sure the probability of happening this, but
that causes similar situation.

The priority calcuation happens for every standby repliy. So if
there're some standbys with wrong priority, They will catch up at
receiving the next standby reply. Thus just asuuming walsenders with
out-of-range priority as non-sync standbys can "fix" it, I believe.
pg_stat_get_wal_senders() reveals such inconsistent state but I don't
think it's not worth addressing.

> Would anyone like to fix this? I could add it to my queue, but it would wait
> a year or more.

The attached does that. Isn't it enough?

# The more significant problem is I haven't suceeded to replay the
# problem..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-handling-of-tentative-state-after-chainging-sync.patch text/x-patch 1.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-02-07 03:56:04 Re: typos in comments and user docs
Previous Message Michael Paquier 2020-02-07 03:47:12 Re: Expose lock group leader pid in pg_stat_activity