Re: Race condition in SyncRepGetSyncStandbysPriority

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masahiko(dot)sawada(at)2ndquadrant(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-04-15 07:33:30
Message-ID: 20200415.163330.44743999060754730.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 15 Apr 2020 13:01:02 +0900, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote in
> On Tue, 14 Apr 2020 at 18:35, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Tue, 14 Apr 2020 13:06:14 +0900, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote in
> > > On Tue, 14 Apr 2020 at 10:34, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > >
> > > > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > > > > At Sat, 11 Apr 2020 18:30:30 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> > > > >> What I think we should do about this is, essentially, to get rid of
> > > > >> SyncRepGetSyncStandbys. Instead, let's have each walsender advertise
> > > > >> whether *it* believes that it is a sync standby, based on its last
> > > > >> evaluation of the relevant GUCs. This would be a bool that it'd
> > > > >> compute and set alongside sync_standby_priority. (Hm, maybe we'd not
> > > >
> > > > > Mmm.. SyncRepGetStandbyPriority returns the "priority" that a
> > > > > walsender thinks it is at, among synchronous_standby_names. Then to
> > > > > decide "I am a sync standby" we need to know how many walsenders with
> > > > > higher priority are alive now. SyncRepGetSyncStandbyPriority does the
> > > > > judgment now and suffers from the inconsistency of priority values.
> > > >
> > > > Yeah. After looking a bit closer, I think that the current definition
> > > > of sync_standby_priority (that is, as the result of local evaluation
> > > > of SyncRepGetStandbyPriority()) is OK. The problem is what we're doing
> > > > with it. I suggest that what we should do in SyncRepGetSyncRecPtr()
> > > > is make one sweep across the WalSnd array, collecting PID,
> > > > sync_standby_priority, *and* the WAL pointers from each valid entry.
> > > > Then examine that data and decide which WAL value we need, without assuming
> > > > that the sync_standby_priority values are necessarily totally consistent.
> > > > But in any case we must examine each entry just once while holding its
> > > > mutex, not go back to it later expecting it to still be the same.
> >
> > SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so
> > sync_standby_priority of any walsender can be changed while the
> > function is scanning welsenders. The issue is we already have
> > inconsistent walsender information before we enter the function. Thus
> > how many times we scan on the array doesn't make any difference.
> >
> > I think we need to do one of the followings.
> >
> > A) prevent SyncRepGetSyncStandbysPriority from being entered while
> > walsender priority is inconsistent.
> >
> > B) make SyncRepGetSyncStandbysPriority be tolerant of priority
> > inconsistency.
> >
> > C) protect walsender priority array from beinig inconsistent.
> >
> > The (B) is the band aids. To achieve A we need to central controller
> > of priority config handling. C is:
> >
> > > Can we have a similar approach of sync_standby_defined for
> > > sync_standby_priority? That is, checkpionter is responsible for
> > > changing sync_standby_priority of all walsenders when SIGHUP. That
> > > way, all walsenders can see a consistent view of
> > > sync_standby_priority. And when a walsender starts, it sets
> > > sync_standby_priority by itself. The logic to decide who's a sync
> > > standby doesn't change. SyncRepGetSyncRecPtr() gets all walsenders
> > > having higher priority along with their WAL positions.
> >
> > Yeah, it works if we do , but the problem of that way is that to
> > determin priority of walsenders, we need to know what walsenders are
> > running. That is, when new walsender comes the process needs to aware
> > of the arrival (or leaving) right away and reassign the priority of
> > every wal senders again.
>
> I think we don't need to reassign the priority when new walsender
> comes or leaves. IIUC The priority is calculated based on only
> synchronous_standby_names. Coming or leaving a walsender doesn't
> affect other's priorities.

Sorry, the "priority" in this area is a bit confusing. The "priority"
defined by synchronous_standby_names is determined in isolation from
the presence of walsenders. The "priority" in
walsnd->sync_standby_priority needs walsender presence to determine.
I thought of the latter in the discussion.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2020-04-15 08:45:31 Re: Parallel copy
Previous Message Kyotaro Horiguchi 2020-04-15 07:26:50 Re: Race condition in SyncRepGetSyncStandbysPriority