| From: | Antonin Houska <ah(at)cybertec(dot)at> |
|---|---|
| To: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> |
| Cc: | alvherre(at)alvh(dot)no-ip(dot)org, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2026-03-23 12:00:34 |
| Message-ID: | 46846.1774267234@localhost |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> > Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >
> > > Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> wrote:
> > >
> > > > The concurrency test failed once. I tried to reproduce the below scenario
> > > > but no luck,i think the reason the assert failure happened because
> > > > after speculative insert there might be no spec CONFIRM or ABORT, thoughts?
> > >
> > > Perhaps, I'll try. I'm not sure the REPACK decoding worker does anthing
> > > special regarding decoding. If you happen to see the problem again, please try
> > > to preserve the related WAL segments - if this is a bug in PG executor,
> > > pg_waldump might reveal that.
> >
> > I could not reproduce the failure, and have no idea how speculative insert can
> > stay w/o CONFIRM / ABORT record. The only problem I could imagine is that
> > change_useless_for_repack() filters out the CONFIRM / ABORT record
> > accidentally, but neither code review nor debugger proves that
> > theory. (Actually if this was the problem, the test failure probably wouldn't
> > be that rare.)
>
> I confirm that I was able to reproduce the crash using debugger and your more
> recent diagnosis [1]. Indeed, filtering was the problem.
>
> Unfortunately, I wasn't able to make the crash easily reproducible using
> isolation tester. The problem is that the logical decoding is performed by a
> background worker, and when the backend executing REPACK waits for the
> background worker, which in turn waits on an injection point, the isolation
> tester does not recognize that it's effectively the backend who is waiting on
> the injection point. Therefore the isolation tester does not proceed to the
> next step.
I could not resist digging in it deeper :-) Attached is a test that reproduces
the crash - it includes the isolation tester enhancement that I posted
separately [1]. It crashes reliably with v43 [2] if your fix v43-0005 is
omitted.
[1] https://www.postgresql.org/message-id/4703.1774250534%40localhost
[2] https://www.postgresql.org/message-id/202603191855.fzsgsnyzfvpt%40alvherre.pgsql
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
| Attachment | Content-Type | Size |
|---|---|---|
| nocfbot-Reproduce-filtering-issue.patch | text/x-diff | 7.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Anthonin Bonnefoy | 2026-03-23 12:07:39 | Re: Propagate XLogFindNextRecord error to callers |
| Previous Message | Fujii Masao | 2026-03-23 12:00:00 | Re: Propagate XLogFindNextRecord error to callers |