From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Paul A Jungwirth' <pj(at)illuminatedcomputing(dot)com>, Mutaamba Maasha <maasha(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: ReplicationSlotRelease() crashes when the instance is in the single user mode |
Date: | 2025-08-18 08:47:27 |
Message-ID: | OSCPR01MB14966B39B3ED21229EAEFA10FF531A@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Paul, Mutaamba,
Here are updated patches. Based on the Robert's suggestion, I separated into two parts.
0001 fixed the original issue and 0002 prohibited the slot manipulation in
single-user mode. I want to focus on 0001 first because on one would argue it.
All comments from you were included in 0002.
> The patch does not apply. The attached v5 fixes it with a small update
> to the context for the slot.h hunk.
Oh, I didn't realize because cfbot said OK. Anyway, new patch could be applied atop HEAD.
> The commit message needs some explanation. Why are we doing this? What
> does the patch do? What alternatives did we consider?
Updated. How do you feel?
> The patch has no docs. If we plan to forbid some functions in
> single-user mode, we should document which ones (e.g. in func.sgml).
A statement was added.
> There are no tests. Do we have any tests at all for single-user mode?
> The only one I see is in recovery/t/017_shm.pl. What if we had tests
> that ran the regress suite in single-user mode? Basically instead of
> psql we run postgres --single? Do we expect a lot of it would fail? Is
> it small enough the test could maintain a diff that expects those
> failures? I'm not saying we should do that as part of this patch, but
> is there any interest in that? Since single-user mode is most often
> used when things are broken, it's a harsh place to hit a bug.
>
> The patch lacks source comments. Something on
> CheckSlotIsInSingleUserMode explaining why would be good.
Comments were added in all caller functions.
> In ReplicationSlotRelease, why only assert that `slot->active_pid !=
> 0`? Why not assert that it's MyProcPid (even outside single-user
> mode)? Can one backend really release a slot while another is using
> it? Can you drop it? Maybe you can copy it?
You meant we should assert `slot->active_pid == MyProcPid`, right?
Naively considered it can be fix, but it is out-of-scope of the thread. It is
existing code, should be discussed verified in another place.
> Are we calling CheckSlotIsInSingleUserMode from everywhere that needs
> it? We tried to check all the functions that call
> ReplicationSlotCreate, ReplicationSlotRelease, and
> update_synced_slots_inactive_since (since they all have these
> asserts). To call out a few:
>
> The pg_replication_origin_* functions don't call the Assert-ing functions.
You asked that whether we should call CheckSlotIsInSingleUserMode in
pg_replication_origin_* APIs, right? It depends on the policy. If we want to
prohibit all replication-related command, it should be. It is still under
discussion, so I did not touch.
> binary_upgrade_logical_slot_has_caught_up - Not available from the command
> line.
I found that we can in both single-user and binary-upgrade mode. At that time
the function could be called:
```
$ postgres --single -D data/ postgres -b
PostgreSQL stand-alone backend 19devel
backend> SELECT binary_upgrade_logical_slot_has_caught_up('test');
1: binary_upgrade_logical_slot_has_caught_up (typeid = 16, len = 1, typmod = -1, byval = t)
----
LOG: starting logical decoding for slot "test"
DETAIL: Streaming transactions committing after 0/0182C640, reading WAL from 0/0182C608.
STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test');
LOG: logical decoding found consistent point at 0/0182C608
DETAIL: There are no running transactions.
STATEMENT: SELECT binary_upgrade_logical_slot_has_caught_up('test');
1: binary_upgrade_logical_slot_has_caught_up = "f" (typeid = 16, len = 1, typmod = -1, byval = t)
----
```
I don't think this is a realistic case, but the check was added.
> WalSndErrorCleanup - Probably not used in single-user mode?
IIUC we won't reach in the single-user mode. Hence it was not called intentionally.
> We also see that PostgresMain calls ReplicationSlotRelease from its
> error handler. Presumably single-user mode would be executing
> PostgresSingleUserMain instead,
ISTM, PostgresMain() would be called from the PostgresSingleUserMain().
> but still perhaps we should call
> CheckSlotIsInSingleUserMode here just as a sanity-check?
I feel this code is to release the acquired slot when ERROR was raised. Since we
have already covered the entrance, it is not needed.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Set-ReplicationSlot-active_pid-even-in-single-use.patch | application/octet-stream | 929 bytes |
v5-0002-Prohibit-slot-manipulation-while-in-single-user-m.patch | application/octet-stream | 5.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Chao Li | 2025-08-18 08:50:34 | Re: GB18030-2022 Support in PostgreSQL |
Previous Message | John Naylor | 2025-08-18 08:34:29 | Re: GB18030-2022 Support in PostgreSQL |