RE: ReplicationSlotRelease() crashes when the instance is in the single user mode

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Robert Haas' <robertmhaas(at)gmail(dot)com>, Paul A Jungwirth <pj(at)illuminatedcomputing(dot)com>, Mutaamba Maasha <maasha(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
Date: 2025-08-13 07:21:13
Message-ID: OSCPR01MB14966D4E276EC9080EF727AA4F52AA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Robert, Paul, Mutaamba,

Sorry for the late reply. I was in the business trip.

> I don't feel good about the direction from which this patch is
> attacking the problem. The original stack trace looks like this:
>
> postgres(ExceptionalCondition+0xab)[0xb86a2a]
> postgres(ReplicationSlotRelease+0x5a)[0x8df10b]
> postgres(pg_replication_slot_advance+0x330)[0x8e46ed]
>
> ReplicationSlotRelease() is called near the end of
> pg_replication_slot_advance. IIUC, this means that we successfully
> acquired the slot and did all the work, and then failed a sanity check
> afterward.

Exactly. The slot is trying to be released after most of tasks are done.

```
#6 0x00000000008fd20b in pg_replication_slot_advance (fcinfo=0x23b4428)
at ../postgres/src/backend/replication/slotfuncs.c:588
588 ReplicationSlotRelease();
(gdb) list
583 * advancing potentially done.
584 */
585 ReplicationSlotsComputeRequiredXmin(false);
586 ReplicationSlotsComputeRequiredLSN();
587
588 ReplicationSlotRelease();
```

> So, I see two possibilities: either it's not OK to acquire
> a replication slot in single-user mode, in which case this should have
> failed earlier, or it's also OK to release it, in which case this
> should not have failed at all. Interestingly, I see that
> ReplicationSlotAcquire() features a special case whose specific
> purpose is to allow it to work in single-user mode, which makes me
> rather suspect that the latter is correct.

Agreed.

> It is possible that it was
> intentional that acquiring works in single-user mode and dropping
> (other than via ReplicationSlotDropAcquired) does not, but so far I
> see nothing in the comments suggesting that.
>
> In any event, that inconsistency between how ReplicationSlotAcquire()
> works and how ReplicationSlotRelease() works seems to me to be the
> core thing that should be fixed here. If the selected fix also
> requires error checks in higher-level functions, as added by this
> patch, then well and good. But right now, the proposed patch isn't
> trying to fix the definitional inconsistency in the underlying layer
> at all, and is instead just installing guard rails to try to keep
> anyone from bumping into it. That seems like a bad direction to go --

To confirm; your point is that we should firstly fix to allow acquiring/releasing
slots in the mode, then consider additional guards, is it right? Valid point.
We've not deeply considered but ideas described in [1] could be used.

> it feels like we're blocking access to potentially-useful
> functionality because of an assertion failure that might just be
> slightly sloppy coding.

I still cannot find enough use-cases to allow manipulating slots, though.

[1]: https://www.postgresql.org/message-id/OSCPR01MB14966ED588A0328DAEBE8CB25F5FA2%40OSCPR01MB14966.jpnprd01.prod.outlook.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-08-13 07:38:37 Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected
Previous Message Chao Li 2025-08-13 07:20:27 Re: GB18030-2022 Support in PostgreSQL