Re: Race condition in SyncRepGetSyncStandbysPriority

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, masahiko(dot)sawada(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-04-17 07:31:36
Message-ID: c1e3537f-c9a0-d0b4-8ab1-835e1192f128@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/04/17 3:00, Tom Lane wrote:
> Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> writes:
>> On 2020/04/14 22:52, Tom Lane wrote:
>>> *Yes it does*. The existing code can deliver entirely broken results
>>> if some walsender exits between where we examine the priorities and
>>> where we fetch the WAL pointers.
>
>> IMO that the broken results can be delivered when walsender marked
>> as sync exits *and* new walsender comes at that moment. If this new
>> walsender uses the WalSnd slot that the exited walsender used,
>> SyncRepGetOldestSyncRecPtr() wronly calculates the oldest lsn based
>> on this new walsender (i.e., different walsender from one marked as sync).
>
> Right, exactly, sorry that I was not more specific.
>
>> BTW, since the patch changes the API of SyncRepGetSyncStandbys(),
>> it should not be back-patched to avoid ABI break. Right?
>
> Anything that is using that is just as broken as the core code is, for the
> same reasons, so I don't have a problem with changing its API. Maybe we
> should rename it while we're at it, just to make it clear that we are
> breaking any external callers. (If there are any, which seems somewhat
> unlikely.)

I agree to change the API if that's the only way to fix the bug. But ISTM that
we can fix the bug without changing the API, like the attached patch does.

Your patch changes the logic to pick up sync standbys, e.g., use qsort(),
in addition to the bug fix. This might be an improvement and I agree that
it's worth considering that idea for the master branch or v14. But I'm not
fan of adding such changes into the back branches if they are not
necessary for the bug fix. I like to basically keep the current logic as it is,
at least for the back branch, like the attached patch does.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
syncrep_bugfix_avoid_api_break_v1.patch text/plain 10.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-17 08:00:15 Re: 001_rep_changes.pl stalls
Previous Message Kashif Zeeshan 2020-04-17 07:08:14 Re: WIP/PoC for parallel backup