Quick Links

Re: pgsql: Prevent invalidation of newly synced replication slots.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Kapila <akapila(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pgsql: Prevent invalidation of newly synced replication slots.
Date:	2026-01-28 12:35:10
Message-ID:	CAA4eK1LhMuxYdf6aR+UZuxdp7+SJUT_4Mf9yz7eiXdY1VB0Z+g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-committers pgsql-hackers

On Wed, Jan 28, 2026 at 4:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jan 28, 2026 at 11:17 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > It is not clear to me either why the similar test like
> > 040_standby_failover_slots_sync is successful and
> > 046_checkpoint_logical_slot is failing. I am still thinking about it
> > but thought of sharing the information I could gather by debugging.
> >
>
> It seems there is some interaction with previous test in same file
> which is causing this failure as we are using the primary node from
> previous test. When I tried to comment out get_changes and its
> corresponding injection_point in the previous test as attached, the
> entire test passed. I think if we use a freshly created primary node,
> this test will pass but I wanted to spend some more time to see
> how/why previous test is causing this issue?
>

I noticed that the previous test didn't quitted the background psql
session used for concurrent checkpoint. By quitting that background
session, the test passed for me consistently. See attached. It is
written in comments atop background_psql: "Be sure to "quit" the
returned object when done with it.". Now, this background session
doesn't directly access the backup_label file but it could be
accessing one of the parent directories where backup_label is present.
One of gen-AI says as follows: "In Windows, MoveFileEx (Error 32:
ERROR_SHARING_VIOLATION) can fail if a process is accessing the file's
parent directory in a way that creates a lock. While the error message
usually points to the file itself, the parent folder is a critical
part of the operation.". I admit that I don't know the internals of
MoveFileEx, so can't say with complete conviction but the attached
sounds like a reasonable fix. Can anyone else who can reproduce the
issue once test the attached patch and share the results?

Does this fix/theory sound plausible?

--
With Regards,
Amit Kapila.

Attachment	Content-Type	Size
quit_checkpoint_bg_session_1.patch	application/octet-stream	535 bytes

In response to

Re: pgsql: Prevent invalidation of newly synced replication slots. at 2026-01-28 10:46:58 from Amit Kapila

Responses

Re: pgsql: Prevent invalidation of newly synced replication slots. at 2026-01-28 12:57:49 from Robert Haas

Browse pgsql-committers by date

	From	Date	Subject
Next Message	Robert Haas	2026-01-28 12:57:49	Re: pgsql: Prevent invalidation of newly synced replication slots.
Previous Message	Amit Kapila	2026-01-28 11:20:18	Re: pgsql: Prevent invalidation of newly synced replication slots.

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Matheus Alcantara	2026-01-28 12:37:14	Re: [PATCH] llvmjit: always add the simplifycfg pass
Previous Message	Álvaro Herrera	2026-01-28 12:28:38	Re: [PATCH] Add max_logical_replication_slots GUC