From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Nisha Moond <nisha(dot)moond412(at)gmail(dot)com> |
Cc: | pgsql-docs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Mention idle_replication_slot_timeout in pg_replication_slots docs |
Date: | 2025-06-27 12:10:13 |
Message-ID: | c9d23cdc-b5dd-455a-8ee9-f1f24d701d89@oss.nttdata.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs |
On 2025/06/27 15:32, Nisha Moond wrote:
> On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>
>>
>>
>> On 2025/06/26 15:46, Nisha Moond wrote:
>>> On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
>>>> as a condition under which the wal_status column can show unreserved or lost.
>>>> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
>>>> cause this behavior when it is set. This has not been documented yet.
>>>> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
>>>>
>>>
>>> +1 to the doc update.
>>
>> Thanks for the review!
>>
>>
>>>> So, how about updating the documentation to also mention
>>>> idle_replication_slot_timeout as a factor that can cause wal_status to
>>>> become unreserved or lost? Patch attached.
>>>>
>>>
>>> Since idle_replication_slot_timeout can only cause wal_status to
>>> become 'lost' and not 'unreserved', perhaps we can reword the sentence
>>> slightly for clarity, suggestion -
>>> "The last two states are seen when max_slot_wal_keep_size is
>>> non-negative and, the 'lost' state may also appear when
>>> idle_replication_slot_timeout is greater than zero."
>>
>> I was thinking that when idle_replication_slot_timeout triggers,
>> the following functions are called, and that wal_status can become
>> "unreserved" before ReplicationSlotRelease() runs. It's very short
>> period, though. Am I wrong?
>>
>> ReplicationSlotMarkDirty();
>> ReplicationSlotSave();
>> ReplicationSlotRelease();
>>
>
> Thank you for pointing it out.
> You are correct that while the checkpointer is in the process of
> invalidating a slot, it sets its PID as the slot’s active_pid. During
> this short window, if a user queries pg_replication_slot, the
> underlying function pg_get_replication_slots will compute the
> wal_status as 'unreserved' for the invalidated slot because the slot
> has a valid active_pid.
>
> That said, it's reasonable to mention in the doc that 'unreserved' may
> appear when idle_replication_slot_timeout is greater than zero, as
> this can indeed happen. So, let's retain the current description.
>
> However, this behavior isn’t specific to
> idle_replication_slot_timeout. For example, when a slot is being
> invalidated due to a different cause "wal_level_insufficient",
> 'unreserved' may also briefly appear in wal_status.
Yes, and "lost" can appear for various reasons, including wal_level_insufficient,
so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost"
status in the note. It would probably be better to remove the mention of "lost"
from that note.
As for "unreserved", it can also occur for different reasons, but typically,
it happens when max_slot_wal_keep_size is set to a non-negative value.
So it might make sense to keep the explanation focused just on "unreserved"
and max_slot_wal_keep_size. For example:
----------------------
<listitem>
<para>
<literal>unreserved</literal> means that the slot no longer
retains the required WAL files and some of them are to be removed at
- the next checkpoint. This state can return
+ the next checkpoint. This can occur when
+ <xref linkend="guc-max-slot-wal-keep-size"/> is set to
+ a non-negative value. This state can return
to <literal>reserved</literal> or <literal>extended</literal>.
</para>
</listitem>
<listitem>
----------------------
What do you think?
Also, I noticed the note that says “If <structfield>restart_lsn</structfield>
is NULL, this field is null” seems inaccurate. For example, when "wal_removed"
happens, restart_lsn is NULL but wal_status is "lost". So maybe we should remove
that note as well?
Regards,
--
Fujii Masao
NTT DATA Japan Corporation
From | Date | Subject | |
---|---|---|---|
Next Message | PG Doc comments form | 2025-06-29 21:04:14 | No information about the internal implementation for RLS |
Previous Message | Nisha Moond | 2025-06-27 06:32:51 | Re: Mention idle_replication_slot_timeout in pg_replication_slots docs |