Re: Race condition in SyncRepGetSyncStandbysPriority

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: masahiko(dot)sawada(at)2ndquadrant(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-04-14 13:52:42
Message-ID: 15616.1586872362@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so
> sync_standby_priority of any walsender can be changed while the
> function is scanning welsenders. The issue is we already have
> inconsistent walsender information before we enter the function. Thus
> how many times we scan on the array doesn't make any difference.

*Yes it does*. The existing code can deliver entirely broken results
if some walsender exits between where we examine the priorities and
where we fetch the WAL pointers. While that doesn't seem to be the
exact issue we're seeing in the buildfarm, it's still another obvious
bug in this code. I will not accept a "fix" that doesn't fix that.

> I think we need to do one of the followings.

> A) prevent SyncRepGetSyncStandbysPriority from being entered while
> walsender priority is inconsistent.
> B) make SyncRepGetSyncStandbysPriority be tolerant of priority
> inconsistency.
> C) protect walsender priority array from beinig inconsistent.

(B) seems like the only practical solution from here. We could
probably arrange for synchronous update of the priorities when
they change in response to a GUC change, but it doesn't seem to
me to be practical to do that in response to walsender exit.
You'd end up finding that an unexpected walsender exit results
in panic'ing the system, which is no better than where we are now.

It doesn't seem to me to be that hard to implement the desired
semantics for synchronous_standby_names with inconsistent info.
In FIRST mode you basically just need to take the N smallest
priorities you see in the array, but without assuming there are no
duplicates or holes. It might be a good idea to include ties at the
end, that is if you see 1,2,2,4 or 1,3,3,4 and you want 2 sync
standbys, include the first three of them in the calculation until
the inconsistency is resolved. In ANY mode I don't see that
inconsistent priorities matter at all.

> If we accept to share variable-length information among processes,
> sharing sync_standby_names or parsed SyncRepConfigData among processes
> would work.

Not sure that we really need more than what's being shared now,
ie each process's last-known index in the sync_standby_names list.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2020-04-14 14:16:07 Re: Poll: are people okay with function/operator table redesign?
Previous Message tushar 2020-04-14 13:41:26 While restoring -getting error if dump contain sql statements generated from generated.sql file