Re: Race condition in SyncRepGetSyncStandbysPriority

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-03-30 07:53:10
Message-ID: 20200330.165310.2107395336195268864.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 27 Mar 2020 13:54:25 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2020/03/27 10:26, Tom Lane wrote:
> > Twice in the past month [1][2], buildfarm member hoverfly has managed
> > to reach the "unreachable" Assert(false) at the end of
> > SyncRepGetSyncStandbysPriority.
>
> When I search the past discussions, I found that Noah Misch reported
> the same issue.
> https://www.postgresql.org/message-id/20200206074552.GB3326097@rfd.leadboat.com
>
> > What seems likely to me, after quickly eyeballing the code, is that
> > hoverfly is hitting the blatantly-obvious race condition in that
> > function.
> > Namely, that the second loop supposes that the state of the walsender
> > array hasn't changed since the first loop.
> > The minimum fix for this, I suppose, would have the first loop capture
> > the sync_standby_priority value for each walsender along with what
> > it's
> > already capturing. But I wonder if the whole function shouldn't be
> > rewritten from scratch, because it seems like the algorithm is both
> > expensively brute-force and unintelligible, which is a sad
> > combination.
> > It's likely that the number of walsenders would never be high enough
> > that efficiency could matter, but then couldn't we use an algorithm
> > that is less complicated and more obviously correct?
>
> +1 to rewrite the function with better algorithm.
>
> > (Because the
> > alternative conclusion, if you reject the theory that a race is
> > happening,
> > is that the algorithm is just flat out buggy; something that's not too
> > easy to disprove either.)
> > Another fairly dubious thing here is that whether or not *am_sync
> > gets set depends not only on whether MyWalSnd is claiming to be
> > synchronous but on how many lower-numbered walsenders are too.
> > Is that really the right thing?
> > But worse than any of that is that the return value seems to be
> > a list of walsender array indexes, meaning that the callers cannot
> > use it without making even stronger assumptions about the array
> > contents not having changed since the start of this function.
> > It sort of looks like the design is based on the assumption that
> > the array contents can't change while SyncRepLock is held ... but
> > if that's the plan then why bother with the per-walsender spinlocks?
> > In any case this assumption seems to be failing, suggesting either
> > that there's a caller that's not holding SyncRepLock when it calls
> > this function, or that somebody is failing to take that lock while
> > modifying the array.
>
> As far as I read the code, that assumption seems still valid. But the
> problem
> is that each walsender updates MyWalSnd->sync_standby_priority at each
> convenient timing, when SIGHUP is signaled. That is, at a certain
> moment,
> some walsenders (also their WalSnd entries in shmem) work based on
> the latest configuration but the others (also their WalSnd entries)
> work based
> on the old one.
>
> lowest_priority = SyncRepConfig->nmembers;
> next_highest_priority = lowest_priority + 1;
>
> SyncRepGetSyncStandbysPriority() calculates the lowest priority among
> all running walsenders as the above, by using the configuration info
> that
> this walsender is based on. But this calculated lowest priority would
> be
> invalid if other walsender is based on different (e.g., old)
> configuraiton.
> This can cause the (other) walsender to have lower priority than
> the calculated lowest priority and the second loop in
> SyncRepGetSyncStandbysPriority() to unexpectedly end.
>
> Therefore, the band-aid fix seems to be to set the lowest priority to
> very large number at the beginning of
> SyncRepGetSyncStandbysPriority().

Or just ignore impossible priorities as non-sync standby. Anyway the
confused state is fixed after all walsenders have loaded the new
configuration.

I remember that I posted a bandaid for maybe the same issue.

https://www.postgresql.org/message-id/20200207.125251.146972241588695685.horikyota.ntt@gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-03-30 07:57:50 Re: Can we get rid of GetLocaleInfoEx() yet?
Previous Message Pavel Stehule 2020-03-30 07:43:04 Re: proposal - psql output file write mode