Quick Links

Re: Adding REPACK [concurrently]

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>
Cc:	alvherre(at)alvh(dot)no-ip(dot)org, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject:	Re: Adding REPACK [concurrently]
Date:	2026-03-23 12:00:34
Message-ID:	46846.1774267234@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> > Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >
> > > Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> wrote:
> > >
> > > > The concurrency test failed once. I tried to reproduce the below scenario
> > > > but no luck,i think the reason the assert failure happened because
> > > > after speculative insert there might be no spec CONFIRM or ABORT, thoughts?
> > >
> > > Perhaps, I'll try. I'm not sure the REPACK decoding worker does anthing
> > > special regarding decoding. If you happen to see the problem again, please try
> > > to preserve the related WAL segments - if this is a bug in PG executor,
> > > pg_waldump might reveal that.
> >
> > I could not reproduce the failure, and have no idea how speculative insert can
> > stay w/o CONFIRM / ABORT record. The only problem I could imagine is that
> > change_useless_for_repack() filters out the CONFIRM / ABORT record
> > accidentally, but neither code review nor debugger proves that
> > theory. (Actually if this was the problem, the test failure probably wouldn't
> > be that rare.)
>
> I confirm that I was able to reproduce the crash using debugger and your more
> recent diagnosis [1]. Indeed, filtering was the problem.
>
> Unfortunately, I wasn't able to make the crash easily reproducible using
> isolation tester. The problem is that the logical decoding is performed by a
> background worker, and when the backend executing REPACK waits for the
> background worker, which in turn waits on an injection point, the isolation
> tester does not recognize that it's effectively the backend who is waiting on
> the injection point. Therefore the isolation tester does not proceed to the
> next step.

I could not resist digging in it deeper :-) Attached is a test that reproduces
the crash - it includes the isolation tester enhancement that I posted
separately [1]. It crashes reliably with v43 [2] if your fix v43-0005 is
omitted.

[1] https://www.postgresql.org/message-id/4703.1774250534%40localhost
[2] https://www.postgresql.org/message-id/202603191855.fzsgsnyzfvpt%40alvherre.pgsql

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment	Content-Type	Size
nocfbot-Reproduce-filtering-issue.patch	text/x-diff	7.7 KB

In response to

Re: Adding REPACK [concurrently] at 2026-03-20 18:06:10 from Antonin Houska

Responses

Re: Adding REPACK [concurrently] at 2026-03-23 16:07:24 from Jim Jones
Re: Adding REPACK [concurrently] at 2026-03-26 13:19:53 from Srinath Reddy Sadipiralla

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Anthonin Bonnefoy	2026-03-23 12:07:39	Re: Propagate XLogFindNextRecord error to callers
Previous Message	Fujii Masao	2026-03-23 12:00:00	Re: Propagate XLogFindNextRecord error to callers