From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, tomas(at)vondra(dot)me |
Subject: | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Date: | 2025-06-03 13:21:19 |
Message-ID: | CAPpHfdvxqjE+RZdoOODQZAmQip6NpvSH40DuvVFWuf_00ek-yQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jun 2, 2025 at 2:53 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, May 29, 2025 at 5:29 PM Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> >
> > On Tue, May 27, 2025 at 2:26 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > Yeah, we should be able to change ABI during beta, but I can't comment
> > > on the idea of effective_restart_lsn without seeing the patch or a
> > > detailed explanation of this idea.
> >
> > Could you, please, check the patch [1]. It implements this idea
> > except it names new field restart_lsn_flushed instead of
> > effective_restart_lsn.
> >
>
> This appears to be a better direction than the other patch, at least
> for HEAD. I noticed a few points while looking at the patch.
>
> 1. restart_lsn_flushed: Can we name it as last_saved_restart_lsn based
> on existing variable last_saved_confirmed_flush?
Good point, renamed.
> 2. There are no comments as to why this is considered only for
> persistent slots when CheckPointReplicationSlots doesn't have any such
> check.
Relevant comments added.
> 3. Please see if it makes sense to copy it in the copy_replication_slot.
Thank you for pointing, but I don't think this is necessary.
copy_replication_slot() calls ReplicationSlotSave(), which updates
last_saved_restart_lsn.
Also, I've changed ReplicationSlotsComputeRequiredLSN() call to
CheckPointReplicationSlots() to update required LSN after
SaveSlotToPath() updated last_saved_restart_lsn. This helps to pass
checks in 001_stream_rep.pl without additional hacks.
> Apart from these, I am not sure if there are still any pending
> comments in the thread to be handled for this patch, so please see to
> avoid missing anything.
>
> > > Now, you see my point related to restart_lsn computation for logical
> > > slots, it is better to also do some analysis of the problem related to
> > > xmin I have highlighted in one of my previous emails [1]. I see your
> > > response to it, but I feel someone needs to give it a try by writing a
> > > test and see the behavior. I am saying because logical slots took
> > > precaution of flushing to disk before updating shared values of xmin
> > > for a reason, whereas similar precautions are not taken for physical
> > > slots, so there could be a problem with that computation as well.
> >
> > I see LogicalConfirmReceivedLocation() performs correctly while
> > updating effective_catalog_xmin only after syncing the slot to the
> > disk. I don't see how effective_xmin gets updates with the logical
> > replication progress though. Could you get me some clue on this,
> > please?
> >
>
> As per my understanding, for logical slots, effective_xmin is only set
> during the initial copy phase (or say if one has to export a
> snapshot), after that, its value won't change. Please read the
> comments in CreateInitDecodingContext() where we set its value. If you
> still have questions about it, we can discuss further.
OK, thank you for the clarification. I've read the comments in
CreateInitDecodingContext() as you suggested. All of above makes me
think *_xmin fields are handled properly.
------
Regards,
Alexander Korotkov
Supabase
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.patch | application/octet-stream | 6.8 KB |
v2-0002-Add-TAP-tests-to-check-replication-slot-advance-d.patch | application/octet-stream | 13.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrei Lepikhov | 2025-06-03 13:23:47 | Re: MergeAppend could consider sorting cheapest child path |
Previous Message | Isaac Morland | 2025-06-03 13:15:17 | Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them |