Re: Race condition in SyncRepGetSyncStandbysPriority

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, masahiko(dot)sawada(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-04-17 15:31:50
Message-ID: 3020.1587137510@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> At Fri, 17 Apr 2020 16:03:34 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>> I agree that it might be worth considering the removal of am_sync for
>> the master branch or v14. But I think that it should not be
>> back-patched.

> Ah! Agreed.

Yeah, that's not necessary to fix the bug. I'd be inclined to leave
it for v14 at this point.

I don't much like the patch Fujii-san posted, though. An important part
of the problem, IMO, is that SyncRepGetSyncStandbysPriority is too
complicated and it's unclear what dependencies it has on the set of
priorities in shared memory being consistent. His patch does not improve
that situation; if anything it makes it worse.

If we're concerned about not breaking ABI in the back branches, what
I propose we do about that is just leave SyncRepGetSyncStandbys in
place but not used by the core code, and remove it only in HEAD.
We can do an absolutely minimal fix for the assertion failure, in
case anybody is calling that code, by just dropping the Assert and
letting SyncRepGetSyncStandbys return NIL if it falls out. (Or we
could let it return the incomplete list, which'd be the behavior
you get today in a non-assert build.)

Also, I realized while re-reading my patch that Kyotaro-san is onto
something about the is_sync_standby flag not being necessary: instead
we can just have the new function SyncRepGetCandidateStandbys return
a reduced count. I'd initially believed that it was necessary for
that function to return the rejected candidate walsenders along with
the accepted ones, but that was a misunderstanding. I still don't
want its API spec to say anything about ordering of the result array,
but we don't need to.

So that leads me to the attached. I propose applying this to the
back branches except for the rearrangement of WALSnd field order.
In HEAD, I'd remove SyncRepGetSyncStandbys and subroutines altogether.

regards, tom lane

Attachment Content-Type Size
syncrep-fixes-6.patch text/x-diff 16.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Glukhov 2020-04-17 15:50:53 matchingsel() and NULL-returning operators
Previous Message Fujii Masao 2020-04-17 15:29:58 Re: 001_rep_changes.pl stalls