Quick Links

Re: Replication slot stats misgivings

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject:	Re: Replication slot stats misgivings
Date:	2021-03-26 02:28:58
Message-ID:	CAA4eK1KBV4JJYrgB7KZXW65h3uXawYO-vEqR=7hX-uXDY058MA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Mar 26, 2021 at 1:17 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-03-25 17:12:31 +0530, Amit Kapila wrote:
> > On Thu, Mar 25, 2021 at 11:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Mar 24, 2021 at 7:06 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > >
> > > > Leaving aside restart case, without some sort of such sanity checking,
> > > > if both drop (of old slot) and create (of new slot) messages are lost
> > > > then we will start accumulating stats in old slots. However, if only
> > > > one of them is lost then there won't be any such problem.
> > > >
> > > > > Perhaps we could have RestoreSlotFromDisk() send something to the stats
> > > > > collector ensuring the mapping makes sense?
> > > > >
> > > >
> > > > Say if we send just the index location of each slot then probably we
> > > > can setup replSlotStats. Now say before the restart if one of the drop
> > > > messages was missed (by stats collector) and that happens to be at
> > > > some middle location, then we would end up restoring some already
> > > > dropped slot, leaving some of the still required ones. However, if
> > > > there is some sanity identifier like name along with the index, then I
> > > > think that would have worked for such a case.
> > >
> > > Even such messages could also be lost? Given that any message could be
> > > lost under a UDP connection, I think we cannot rely on a single
> > > message. Instead, I think we need to loosely synchronize the indexes
> > > while assuming the indexes in replSlotStats and
> > > ReplicationSlotCtl->replication_slots are not synchronized.
> > >
> > > >
> > > > I think it would have been easier if we would have some OID type of
> > > > identifier for each slot. But, without that may be index location of
> > > > ReplicationSlotCtl->replication_slots and slotname combination can
> > > > reduce the chances of slot stats go wrong quite less even if not zero.
> > > > If not name, do we have anything else in a slot that can be used for
> > > > some sort of sanity checking?
> > >
> > > I don't see any useful information in a slot for sanity checking.
> > >
> >
> > In that case, can we do a hard check for which slots exist if
> > replSlotStats runs out of space (that can probably happen only after
> > restart and when we lost some drop messages)?
>
> I suggest we wait doing anything about this until we know if the shared
> stats patch gets in or not (I'd give it 50% maybe). If it does get in
> things get a good bit easier, because we don't have to deal with the
> message loss issues anymore.
>

Okay, that makes sense.

--
With Regards,
Amit Kapila.

In response to

Re: Replication slot stats misgivings at 2021-03-25 19:47:26 from Andres Freund

Responses

Re: Replication slot stats misgivings at 2021-03-30 00:58:34 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2021-03-26 02:39:24	Re: making update/delete of inheritance trees scale better
Previous Message	Peter Geoghegan	2021-03-26 01:58:30	Re: New IndexAM API controlling index vacuum strategies