Re: ReplicationSlotsComputeRequiredXmin seems pretty questionable

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ReplicationSlotsComputeRequiredXmin seems pretty questionable
Date: 2020-08-24 06:13:40
Message-ID: CAA4eK1LT=O_39MHSQZheibvwZ8A3AnTG2Fu3RFkq+9vAqyvgyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 22, 2020 at 8:33 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > On Thu, Aug 20, 2020 at 10:28 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> 2. On the other hand, the code is *releasing* the
> >> ReplicationSlotControlLock before it calls
> >> ProcArraySetReplicationSlotXmin, and that seems like a flat out
> >> concurrency bug.
>
> > It is not clear to me how those values can go backward.
>
> After releasing ReplicationSlotControlLock, that code is holding no
> lock at all (in the already_locked=false case I'm concerned about).
> Thus the scenario to consider is:
>
> 1. Walsender A runs ReplicationSlotsComputeRequiredXmin, computes
> some perfectly-valid xmins, releases ReplicationSlotControlLock,
> amd then gets swapped out to Timbuktu.
>
> 2. Time passes and the "true values" of those xmins advance thanks
> to other walsender activity.
>
> 3. Walsender B runs ReplicationSlotsComputeRequiredXmin, computes
> some perfectly-valid xmins, and successfully stores them in the
> procarray.
>
> 4. Walsender A returns from never-never land, and stores its now
> quite stale results in the procarray, causing the globally visible
> xmins to go backwards from where they were after step 3.
>
> I see no mechanism in the code that prevents this scenario.
> On reflection I'm not even very sure that the code change
> I'm suggesting would prevent it. It'd prevent walsenders
> from entering or exiting until we've updated the procarray,
> but there's nothing to stop the furthest-back walsender from
> advancing its values.
>

I think we can prevent that if we allow
ProcArraySetReplicationSlotXmin to update the shared values only when
new xmin values follows the shared values. I am not very sure if it is
safe but I am not able to think of a problem with it. The other way
could be to always acquire ProcArrayLock before calling
ReplicationSlotsComputeRequiredXmin or before acquiring
ReplicationSlotControlLock but that seems too restrictive.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message hubert depesz lubaczewski 2020-08-24 06:53:29 Re: [PG13] Planning (time + buffers) data structure in explain plan (format text)
Previous Message Dilip Kumar 2020-08-24 06:12:12 Re: Re: [HACKERS] Custom compression methods