Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix assert failure when decoding XLOG_PARAMETER_CHANGE on primary
Date: 2025-02-04 08:59:11
Message-ID: CAD21AoAABk=EhwMsyQLhbWTOxX1ereM6KpN3ey=1rx05kZsJYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 2, 2025 at 8:11 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Jan 24, 2025 at 4:05 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > When a standby replays a XLOG_PARAMETER_CHANGE record that lowers
> > wal_level from logical, we invalidate all logical slots only when the
> > standby is in hot standby mode:
> >
> > if (InRecovery && InHotStandby &&
> > xlrec.wal_level < WAL_LEVEL_LOGICAL &&
> > wal_level >= WAL_LEVEL_LOGICAL)
> > InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_LEVEL,
> > 0, InvalidOid,
> > InvalidTransactionId);
> >
> > However, it's possible that this record is replayed when not in hot
> > standby mode and the slot is used after the promotion. In this case,
> > the following Assert in xlog_decode() fails:
> >
> > /*
> > * This can occur only on a standby, as a primary would
> > * not allow to restart after changing wal_level < logical
> > * if there is pre-existing logical slot.
> > */
>
> Shouldn't we do similar to what this comment indicates on standby? We
> can disallow to start the server as standby, if the hot_standby is off
> and there is a pre-existing logical slot.

It seems like a better idea. I thought we could pass StandbyMode to
StartupReplicationSlots() and check if there is a pre-existing logical
slot, but it would break the ABI compatibility. It might not be a
problem in practice as StartupReplicationSlots() is normally used only
by the startup process. But if we want to avoid that we can introduce
a new function for that.

>
> > Assert(RecoveryInProgress());
> > ereport(ERROR,
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > errmsg("logical decoding on standby requires \"wal_level\" >=
> > \"logical\" on the primary")));
> >
> > Here is brief steps to reproduce this issue:
> >
> > 1. setup the primary and the standby servers (with hot_standby=on).
> > 2. create a logical slot on the standby.
> > 3. on standby, set hot_standby=off and restart.
> > 4. on primary, lower wal_level to replica and restart.
> > 5. promote the standby and execute the logical decoding.
> >
> > I've attached a small patch to fix this issue. With the patch, we
> > invalidate all logical slots even when not in hot standby, that is,
> > the patch just removes InHotStandby from the condition. Thoughts?
> >
>
> This can fix the scenario you described but I am not sure it is a good
> idea. We can consider the fix on these lines if you don't like the
> above suggestion.

Agreed. If your suggestion doesn't work, let's go back to this idea.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chiranmoy.Bhattacharya@fujitsu.com 2025-02-04 09:01:33 Re: [PATCH] SVE popcount support
Previous Message Bertrand Drouvot 2025-02-04 08:49:41 Re: per backend WAL statistics