From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Subject: | Re: Replication slot stats misgivings |
Date: | 2021-03-23 14:37:14 |
Message-ID: | CAD21AoD9Orq=xuZhaxowqoEZvBpvFiT7hhtA+n3B1WJ_VM9pCQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 23, 2021 at 3:09 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Mar 22, 2021 at 12:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 22, 2021 at 1:25 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Mar 20, 2021 at 3:52 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > >
> > > > - If max_replication_slots was lowered between a restart,
> > > > pgstat_read_statfile() will happily write beyond the end of
> > > > replSlotStats.
> > >
> > > I think we cannot restart the server after lowering
> > > max_replication_slots to a value less than the number of replication
> > > slots actually created on the server. No?
> >
> > This problem happens in the case where max_replication_slots is
> > lowered and there still are stats for a slot.
> >
>
> I think this can happen only if the drop message is lost, right?
Yes, I think you're right. In that case, the stats file could have
more slots statistics than the lowered max_replication_slots.
>
> > I understood the risk of running out of replSlotStats. If we use the
> > index in replSlotStats instead, IIUC we need to somehow synchronize
> > the indexes in between replSlotStats and
> > ReplicationSlotCtl->replication_slots. The order of replSlotStats is
> > preserved across restarting whereas the order of
> > ReplicationSlotCtl->replication_slots isn’t (readdir() that is used by
> > StartupReplicationSlots() doesn’t guarantee the order of the returned
> > entries in the directory). Maybe we can compare the slot name in the
> > received message to the name in the element of replSlotStats. If they
> > don’t match, we swap entries in replSlotStats to synchronize the index
> > of the replication slot in ReplicationSlotCtl->replication_slots and
> > replSlotStats. If we cannot find the entry in replSlotStats that has
> > the name in the received message, it probably means either it's a new
> > slot or the previous create message is dropped, we can create the new
> > stats for the slot. Is that what you mean, Andres?
> >
>
> I wonder how in this scheme, we will remove the risk of running out of
> 'replSlotStats' and still restore correct stats assuming the drop
> message is lost? Do we want to check after restoring each slot info
> whether the slot with that name exists?
Yeah, I think we need such a check at least if the number of slot
stats in the stats file is larger than max_replication_slots. Or we
can do that at every startup to remove orphaned slot stats.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-03-23 14:44:02 | Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb |
Previous Message | Bruce Momjian | 2021-03-23 14:34:38 | Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view? |