Re: Review for GetWALAvailability()

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: alvherre(at)2ndquadrant(dot)com
Cc: masao(dot)fujii(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Review for GetWALAvailability()
Date: 2020-06-17 04:56:07
Message-ID: 20200617.135607.687059791532071892.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 17 Jun 2020 10:17:07 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Tue, 16 Jun 2020 14:31:43 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> > On 2020-Jun-16, Kyotaro Horiguchi wrote:
> >
> > > I noticed the another issue. If some required WALs are removed, the
> > > slot will be "invalidated", that is, restart_lsn is set to invalid
> > > value. As the result we hardly see the "lost" state.
> > >
> > > It can be "fixed" by remembering the validity of a slot separately
> > > from restart_lsn. Is that worth doing?
> >
> > We discussed this before. I agree it would be better to do this
> > in some way, but I fear that if we do it naively, some code might exist
> > that reads the LSN without realizing that it needs to check the validity
> > flag first.
>
> Yes, that was my main concern on it. That's error-prone. How about
> remembering the LSN where invalidation happened? It's safe since no
> others than slot-monitoring functions would look
> last_invalidated_lsn. It can be reset if active_pid is a valid pid.
>
> InvalidateObsoleteReplicationSlots:
> ...
> SpinLockAcquire(&s->mutex);
> + s->data.last_invalidated_lsn = s->data.restart_lsn;
> s->data.restart_lsn = InvalidXLogRecPtr;
> SpinLockRelease(&s->mutex);

The attached does that (Poc). No document fix included.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
GetWalAvailability_change_statuses_fix_lost.patch text/x-patch 8.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-06-17 05:03:56 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Thomas Munro 2020-06-17 04:47:23 Re: Does TupleQueueReaderNext() really need to copy its result?