Re: Introduce XID age and inactive timeout based replication slot invalidation

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-11-13 09:29:25
Message-ID: CALDaNm2mwkVFLfe8pLcU1W5Oy1vRr1Wzp53XGV08kr4Z2=SJpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 7 Nov 2024 at 15:33, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> wrote:
>
> On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Please find the attached v46 patch having changes for the above review
> > comments and your test review comments and Shveta's review comments.
> >
> Hi,
>
> I’ve reviewed this thread and am interested in working on the
> remaining tasks and comments, as well as the future review comments.
> However, Bharath, please let me know if you'd prefer to continue with
> it.
>
> Attached the rebased v47 patch, which also addresses Peter’s comments
> #2, #3, and #4 at [1]. I will try addressing other comments as well in
> next versions.

The following crash occurs while upgrading:
2024-11-13 14:19:45.955 IST [44539] LOG: checkpoint starting: time
TRAP: failed Assert("!(*invalidated && SlotIsLogical(s) &&
IsBinaryUpgrade)"), File: "slot.c", Line: 1793, PID: 44539
postgres: checkpointer (ExceptionalCondition+0xbb)[0x555555e305bd]
postgres: checkpointer (+0x63ab04)[0x555555b8eb04]
postgres: checkpointer
(InvalidateObsoleteReplicationSlots+0x149)[0x555555b8ee5f]
postgres: checkpointer (CheckPointReplicationSlots+0x267)[0x555555b8f125]
postgres: checkpointer (+0x1f3ee8)[0x555555747ee8]
postgres: checkpointer (CreateCheckPoint+0x78f)[0x5555557475ee]
postgres: checkpointer (CheckpointerMain+0x632)[0x555555b2f1e7]
postgres: checkpointer (postmaster_child_launch+0x119)[0x555555b30892]
postgres: checkpointer (+0x5e2dc8)[0x555555b36dc8]
postgres: checkpointer (PostmasterMain+0x14bd)[0x555555b33647]
postgres: checkpointer (+0x487f2e)[0x5555559dbf2e]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7ffff6c29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7ffff6c29e40]
postgres: checkpointer (_start+0x25)[0x555555634c25]
2024-11-13 14:19:45.967 IST [44538] LOG: checkpointer process (PID
44539) was terminated by signal 6: Aborted

This can happen in the following case:
1) Setup a logical replication cluster with enough data so that it
will take at least few minutes to upgrade
2) Stop the publisher node
3) Configure replication_slot_inactive_timeout and checkpoint_timeout
to 30 seconds
4) Upgrade the publisher node.

This is happening because logical replication slots are getting
invalidated during upgrade and there is an assertion which checks that
the slots are not invalidated.
I feel this can be fixed by having a function similar to
check_max_slot_wal_keep_size which will make sure that
replication_slot_inactive_timeout is 0 during upgrade.

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nisha Moond 2024-11-13 09:30:28 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Amit Kapila 2024-11-13 08:56:37 Re: Commit Timestamp and LSN Inversion issue