From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tomas(at)vondra(dot)me" <tomas(at)vondra(dot)me>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Date: | 2025-06-25 06:17:46 |
Message-ID: | CAFiTN-uGkrCDi9kqXJqLB+3ATn_ZULmRER73VhkLSDSAnO8SNg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jun 25, 2025 at 10:57 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Hi,
>
> After commit ca307d5, I noticed another crash when testing
> some other logical replication features.
>
> The server with max_replication_slots set to 0 would crash when executing CHECKPOINT.
>
> TRAP: failed Assert("ReplicationSlotCtl != NULL"), File: "slot.c", Line: 1162, PID: 577315
> postgres: checkpointer (ExceptionalCondition+0x9e)[0xc046cb]
> postgres: checkpointer (ReplicationSlotsComputeRequiredLSN+0x30)[0x99697f]
> postgres: checkpointer (CheckPointReplicationSlots+0x191)[0x997dc1]
> postgres: checkpointer [0x597b1b]
> postgres: checkpointer (CreateCheckPoint+0x6d1)[0x59729e]
> postgres: checkpointer (CheckpointerMain+0x559)[0x93ee79]
> postgres: checkpointer (postmaster_child_launch+0x15f)[0x940311]
> postgres: checkpointer [0x9468b0]
> postgres: checkpointer (PostmasterMain+0x1258)[0x9434f8]
> postgres: checkpointer (main+0x2fe)[0x7f5f9c]
> /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f7585f81d85]
> postgres: checkpointer (_start+0x2e)[0x4958ee]
>
> I think it is trying to access the replication slots when the shared memory
> for them was not allocated.
I do not understand why CheckPointReplicationSlots() calls
ReplicationSlotsComputeRequiredLSN() unconditionally, shouldn't this
be called under the check[1], If not then instead of asserting
Assert("ReplicationSlotCtl != NULL"), this should just return if
ReplicationSlotCtl is NULL, isn't it, because ReplicationSlotCtl is
not allocated if max_replication_slots is 0.
[1]
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -2131,7 +2131,8 @@ CheckPointReplicationSlots(bool is_shutdown)
* Recompute the required LSN as SaveSlotToPath() updated
* last_saved_restart_lsn for slots.
*/
- ReplicationSlotsComputeRequiredLSN();
+ if (max_replication_slots > 0)
+ ReplicationSlotsComputeRequiredLSN();
}
--
Regards,
Dilip Kumar
Google
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2025-06-25 06:45:55 | Re: pgsql: Introduce pg_shmem_allocations_numa view |
Previous Message | Dmitry | 2025-06-25 06:11:37 | IPC/MultixactCreation on the Standby server |