| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Amit Kapila <akapila(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pgsql: Prevent invalidation of newly synced replication slots. |
| Date: | 2026-01-28 12:35:10 |
| Message-ID: | CAA4eK1LhMuxYdf6aR+UZuxdp7+SJUT_4Mf9yz7eiXdY1VB0Z+g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-committers pgsql-hackers |
On Wed, Jan 28, 2026 at 4:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jan 28, 2026 at 11:17 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > It is not clear to me either why the similar test like
> > 040_standby_failover_slots_sync is successful and
> > 046_checkpoint_logical_slot is failing. I am still thinking about it
> > but thought of sharing the information I could gather by debugging.
> >
>
> It seems there is some interaction with previous test in same file
> which is causing this failure as we are using the primary node from
> previous test. When I tried to comment out get_changes and its
> corresponding injection_point in the previous test as attached, the
> entire test passed. I think if we use a freshly created primary node,
> this test will pass but I wanted to spend some more time to see
> how/why previous test is causing this issue?
>
I noticed that the previous test didn't quitted the background psql
session used for concurrent checkpoint. By quitting that background
session, the test passed for me consistently. See attached. It is
written in comments atop background_psql: "Be sure to "quit" the
returned object when done with it.". Now, this background session
doesn't directly access the backup_label file but it could be
accessing one of the parent directories where backup_label is present.
One of gen-AI says as follows: "In Windows, MoveFileEx (Error 32:
ERROR_SHARING_VIOLATION) can fail if a process is accessing the file's
parent directory in a way that creates a lock. While the error message
usually points to the file itself, the parent folder is a critical
part of the operation.". I admit that I don't know the internals of
MoveFileEx, so can't say with complete conviction but the attached
sounds like a reasonable fix. Can anyone else who can reproduce the
issue once test the attached patch and share the results?
Does this fix/theory sound plausible?
--
With Regards,
Amit Kapila.
| Attachment | Content-Type | Size |
|---|---|---|
| quit_checkpoint_bg_session_1.patch | application/octet-stream | 535 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2026-01-28 12:57:49 | Re: pgsql: Prevent invalidation of newly synced replication slots. |
| Previous Message | Amit Kapila | 2026-01-28 11:20:18 | Re: pgsql: Prevent invalidation of newly synced replication slots. |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Matheus Alcantara | 2026-01-28 12:37:14 | Re: [PATCH] llvmjit: always add the simplifycfg pass |
| Previous Message | Álvaro Herrera | 2026-01-28 12:28:38 | Re: [PATCH] Add max_logical_replication_slots GUC |