Re: Race condition in SyncRepGetSyncStandbysPriority

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: masahiko(dot)sawada(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-04-16 17:20:04
Message-ID: 43ecec68-abff-6cdb-8412-204451d650ac@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/04/14 22:52, Tom Lane wrote:
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
>> SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so
>> sync_standby_priority of any walsender can be changed while the
>> function is scanning welsenders. The issue is we already have
>> inconsistent walsender information before we enter the function. Thus
>> how many times we scan on the array doesn't make any difference.
>
> *Yes it does*. The existing code can deliver entirely broken results
> if some walsender exits between where we examine the priorities and
> where we fetch the WAL pointers.

So, in this case, the oldest lsn that SyncRepGetOldestSyncRecPtr()
calculates may be based on also the lsn of already-exited walsender.
This is what you say "broken results"? If yes, ISTM that this issue still
remains even after applying your patch. No? The walsender marked
as sync may still exit just before SyncRepGetOldestSyncRecPtr()
calculates the oldest lsn.

IMO that the broken results can be delivered when walsender marked
as sync exits *and* new walsender comes at that moment. If this new
walsender uses the WalSnd slot that the exited walsender used,
SyncRepGetOldestSyncRecPtr() wronly calculates the oldest lsn based
on this new walsender (i.e., different walsender from one marked as sync).
If this is actually what you tried to say "broken results", your patch
seems fine and fixes the issue.

BTW, since the patch changes the API of SyncRepGetSyncStandbys(),
it should not be back-patched to avoid ABI break. Right?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-04-16 17:34:39 Re: fixing old_snapshot_threshold's time->xid mapping
Previous Message Andres Freund 2020-04-16 17:14:41 Re: fixing old_snapshot_threshold's time->xid mapping