RE: ReplicationSlotRelease() crashes when the instance is in the single user mode

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Paul A Jungwirth' <pj(at)illuminatedcomputing(dot)com>, Mutaamba Maasha <maasha(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: ReplicationSlotRelease() crashes when the instance is in the single user mode
Date: 2025-08-18 08:47:27
Message-ID: OSCPR01MB14966B39B3ED21229EAEFA10FF531A@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Paul, Mutaamba,

Here are updated patches. Based on the Robert's suggestion, I separated into two parts.
0001 fixed the original issue and 0002 prohibited the slot manipulation in
single-user mode. I want to focus on 0001 first because on one would argue it.

All comments from you were included in 0002.

> The patch does not apply. The attached v5 fixes it with a small update
> to the context for the slot.h hunk.

Oh, I didn't realize because cfbot said OK. Anyway, new patch could be applied atop HEAD.

> The commit message needs some explanation. Why are we doing this? What
> does the patch do? What alternatives did we consider?

Updated. How do you feel?

> The patch has no docs. If we plan to forbid some functions in
> single-user mode, we should document which ones (e.g. in func.sgml).

A statement was added.

> There are no tests. Do we have any tests at all for single-user mode?
> The only one I see is in recovery/t/017_shm.pl. What if we had tests
> that ran the regress suite in single-user mode? Basically instead of
> psql we run postgres --single? Do we expect a lot of it would fail? Is
> it small enough the test could maintain a diff that expects those
> failures? I'm not saying we should do that as part of this patch, but
> is there any interest in that? Since single-user mode is most often
> used when things are broken, it's a harsh place to hit a bug.
>
> The patch lacks source comments. Something on
> CheckSlotIsInSingleUserMode explaining why would be good.

Comments were added in all caller functions.

> In ReplicationSlotRelease, why only assert that `slot->active_pid !=
> 0`? Why not assert that it's MyProcPid (even outside single-user
> mode)? Can one backend really release a slot while another is using
> it? Can you drop it? Maybe you can copy it?

You meant we should assert `slot->active_pid == MyProcPid`, right?
Naively considered it can be fix, but it is out-of-scope of the thread. It is
existing code, should be discussed verified in another place.

> Are we calling CheckSlotIsInSingleUserMode from everywhere that needs
> it? We tried to check all the functions that call
> ReplicationSlotCreate, ReplicationSlotRelease, and
> update_synced_slots_inactive_since (since they all have these
> asserts). To call out a few:
>
> The pg_replication_origin_* functions don't call the Assert-ing functions.

You asked that whether we should call CheckSlotIsInSingleUserMode in
pg_replication_origin_* APIs, right? It depends on the policy. If we want to
prohibit all replication-related command, it should be. It is still under
discussion, so I did not touch.

> binary_upgrade_logical_slot_has_caught_up - Not available from the command
> line.

I found that we can in both single-user and binary-upgrade mode. At that time
the function could be called:

```
$ postgres --single -D data/ postgres -b

PostgreSQL stand-alone backend 19devel
backend> SELECT binary_upgrade_logical_slot_has_caught_up('test');
1: binary_upgrade_logical_slot_has_caught_up (typeid = 16, len = 1, typmod = -1, byval = t)
----
LOG: starting logical decoding for slot "test"
DETAIL: Streaming transactions committing after 0/0182C640, reading WAL from 0/0182C608.
STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test');

LOG: logical decoding found consistent point at 0/0182C608
DETAIL: There are no running transactions.
STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test');

1: binary_upgrade_logical_slot_has_caught_up = "f" (typeid = 16, len = 1, typmod = -1, byval = t)
----
```

I don't think this is a realistic case, but the check was added.

> WalSndErrorCleanup - Probably not used in single-user mode?

IIUC we won't reach in the single-user mode. Hence it was not called intentionally.

> We also see that PostgresMain calls ReplicationSlotRelease from its
> error handler. Presumably single-user mode would be executing
> PostgresSingleUserMain instead,

ISTM, PostgresMain() would be called from the PostgresSingleUserMain().

> but still perhaps we should call
> CheckSlotIsInSingleUserMode here just as a sanity-check?

I feel this code is to release the acquired slot when ERROR was raised. Since we
have already covered the entrance, it is not needed.

[1]: https://www.postgresql.org/message-id/CA%2BTgmobspWhoMysNHL9b3C9xi%3DOpHdBYhtAgDH4N_A2foyjN-w%40mail.gmail.com

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v5-0001-Set-ReplicationSlot-active_pid-even-in-single-use.patch application/octet-stream 929 bytes
v5-0002-Prohibit-slot-manipulation-while-in-single-user-m.patch application/octet-stream 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-08-18 08:50:34 Re: GB18030-2022 Support in PostgreSQL
Previous Message John Naylor 2025-08-18 08:34:29 Re: GB18030-2022 Support in PostgreSQL