Re: Race condition in SyncRepGetSyncStandbysPriority

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race condition in SyncRepGetSyncStandbysPriority
Date: 2020-04-16 07:48:28
Message-ID: CA+fd4k7_wcpZSwK0Axq5ag1Hhb7cMgp2b_CbdZegRTSzZtQAFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 16 Apr 2020 at 16:22, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 15 Apr 2020 11:31:49 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > > At Tue, 14 Apr 2020 16:32:40 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> > > + stby->is_sync_standby = true; /* might change below */
> >
> > > I'm uneasy with that. In quorum mode all running standbys are marked
> > > as "sync" and that's bogus.
> >
> > I don't follow that? The existing coding of SyncRepGetSyncStandbysQuorum
> > returns all the candidates in its list, so this is isomorphic to that.
>
> The existing code actully does that. On the other hand
> SyncRepGetSyncStandbysPriority returns standbys that *are known to be*
> synchronous, but *Quorum returns standbys that *can be* synchronous.
> What the two functions return are different from each other. So it
> should be is_sync_standby for -Priority and is_sync_candidate for
> -Quorum.
>
> > Possibly a different name for the flag would be more suited?
> >
> > > On the other hand sync_standbys is already sorted in priority order so I think we can get rid of the member by setting *am_sync as the follows.
> >
> > > SyncRepGetSyncRecPtr:
> > > if (sync_standbys[i].is_me)
> > > {
> > > *am_sync = (i < SyncRepConfig->num_sync);
> > > break;
> > > }
> >
> > I disagree with this, it will change the behavior in the quorum case.
>
> Oops, you're right. I find the whole thing there (and me) is a bit
> confusing. syncrep_method affects how some values (specifically
> am_sync and sync_standbys) are translated at several calling depths.
> And the *am_sync informs nothing in quorum mode.
>
> > In any case, a change like this will cause callers to know way more than
> > they ought to about the ordering of the array. In my mind, the fact that
> > SyncRepGetSyncStandbysPriority is sorting the array is an internal
> > implementation detail; I do not want it to be part of the API.
>
> Anyway the am_sync and is_sync_standby is utterly useless in quorum
> mode. That discussion is pretty persuasive if not, but actually the
> upper layers (SyncRepReleaseWaiters and SyncRepGetSyncRecPtr) referes
> to syncrep_method to differentiate the interpretation of the am_sync
> flag and sync_standbys list. So anyway the difference is actually a
> part of API.
>
> After thinking some more, I concluded that some of the variables are
> wrongly named or considered, and redundant. The fucntion of am_sync is
> covered by got_recptr in SyncRepReleaseWaiters, so it's enough that
> SyncRepGetSyncRecPtr just reports to the caller whether the caller may
> release some of the waiter processes. This simplifies the related
> functions and make it (to me) clearer.
>
> Please find the attached.
>
>
> > (Apropos to that, I realized from working on this patch that there's
> > another, completely undocumented assumption in the existing code, that
> > the integer list will be sorted by walsender index for equal priorities.
> > I don't like that either, and not just because it's undocumented.)
>
> That seems accidentally. Sorting by priority is the disigned behavior
> and documented, in contrast, entries of the same priority are ordered
> in index order by accident and not documented, that means it can be
> changed anytime. I think we don't define everyting in such detail.
>

This is just a notice; I'm reading your latest patch but it seems to
include unrelated changes:

$ git diff --stat
src/backend/replication/syncrep.c | 475
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------------------------------------------------------------------------------------------------------------------------
src/backend/replication/walsender.c | 40 ++++++++++++++-----
src/bin/pg_dump/compress_io.c | 12 ++++++
src/bin/pg_dump/pg_backup_directory.c | 48 ++++++++++++++++++-----
src/include/replication/syncrep.h | 20 +++++++++-
src/include/replication/walsender_private.h | 16 ++++----
6 files changed, 274 insertions(+), 337 deletions(-)

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-04-16 08:11:00 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Masahiko Sawada 2020-04-16 07:39:00 Re: Vacuum o/p with (full 1, parallel 0) option throwing an error