Re: Replication slot stats misgivings

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-03-23 14:37:14
Message-ID: CAD21AoD9Orq=xuZhaxowqoEZvBpvFiT7hhtA+n3B1WJ_VM9pCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >
> > > > - If max_replication_slots was lowered between a restart,
> > > > pgstat_read_statfile() will happily write beyond the end of
> > > > replSlotStats.
> > >
> > > I think we cannot restart the server after lowering
> > > max_replication_slots to a value less than the number of replication
> > > slots actually created on the server. No?
> >
> > This problem happens in the case where max_replication_slots is
> > lowered and there still are stats for a slot.
> >
>
> I think this can happen only if the drop message is lost, right?

Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.

>
> > I understood the risk of running out of replSlotStats. If we use the
> > index in replSlotStats instead, IIUC we need to somehow synchronize
> > the indexes in between replSlotStats and
> > ReplicationSlotCtl->replication_slots. The order of replSlotStats is
> > preserved across restarting whereas the order of
> > ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
> > StartupReplicationSlots() doesn’t guarantee the order of the returned
> > entries in the directory). Maybe we can compare the slot name in the
> > received message to the name in the element of replSlotStats. If they
> > don’t match, we swap entries in replSlotStats to synchronize the index
> > of the replication slot in ReplicationSlotCtl->replication_slots and
> > replSlotStats. If we cannot find the entry in replSlotStats that has
> > the name in the received message, it probably means either it's a new
> > slot or the previous create message is dropped, we can create the new
> > stats for the slot. Is that what you mean, Andres?
> >
>
> I wonder how in this scheme, we will remove the risk of running out of
> 'replSlotStats' and still restore correct stats assuming the drop
> message is lost? Do we want to check after restoring each slot info
> whether the slot with that name exists?

Yeah, I think we need such a check at least if the number of slot
stats in the stats file is larger than max_replication_slots. Or we
can do that at every startup to remove orphaned slot stats.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-23 14:44:02 Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Previous Message Bruce Momjian 2021-03-23 14:34:38 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?