Re: Adding REPACK [concurrently]

From: Antonin Houska <ah(at)cybertec(dot)at>
To:
Cc: Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-01-27 10:57:36
Message-ID: 88003.1769511456@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com> wrote:
>
> > PART 2:
> >
> > I have continued working with stress tests. This time I added your WIP patch to fix the LR\CLOG race.
> >
> > I made the following configs:
> > 1) just REPACK CONCURRENTLY - ok
> > 2) + relcheckxmin (see PART1) - ok
> > 3) + worker - ok
> > 4) + multiple snapshots - broken in multiple ways.
> >
> > You may see example of run here - https://cirrus-ci.com/build/6359048020295680
> >
> > Some examples:
> >
> > 1) 'pgbench: error: client 11 script 0 aborted in command 20 query 0: ERROR: could not read blocks 0..0 in file "base/5/16414": read only 0
> > of 8192 bytes
> > 2) at /home/postgres/postgres/contrib/amcheck/t/008_repack_concurrently.pl line 51.
> > [15:36:37.204] # 'pgbench: error: client 5 script 0 aborted in command 28 query 0: ERROR: division by zero
> > 3) 'pgbench: error: client 12 script 0 aborted in command 6 query 0: ERROR: cache lookup failed for relation 17400
>
> Thanks, I'll check these.

PROC_IN_VACUUM shouldn't be used for the same reason StartupDecodingContext()
avoids setting PROC_IN_LOGICAL_DECODING in transaction. I've removed that and
the tests work for me. Especially the "cache lookup failed" error is almost
certainly related. Please let me know if you still get the other errors
(Except for 2, which is probably due to the MVCC-unsafe behavior, as discussed
earlier.)

The 0006 part needs more work (definitely beyond PG 19). For now I've
summarized the problem in the code this way:

+ * As there is no snapshot, our xmin should be invalid now.
+ *
+ * TODO xid can still be valid. We can mark our transaction with the
+ * PROC_IN_VACUUM flag, but at the same time we need to make sure that
+ * anything we write is ignored by VACUUM: since our xid is >= xmin of
+ * our replication slot, the slot does not help. Other transaction
+ * might use their RecentXmin to check if our xact is still running
+ * (see TransactionIdIsInProgress) before they check CLOG. By using
+ * PROC_IN_VACUUM we'd let their RecentXmin skip our xid. Thus our
+ * xact would appear not running anymore, but not yet marked committed
+ * in CLOG either, therefore aborted: it's o.k. for VACUUM to clean up
+ * tuples written by aborted transaction.
+ *
+ * Perhaps we can add a new field 'relisvalid' to pg_class and
+ * something alike to pg_index and make sure that neither queries nor
+ * VACUUM can use tables / indexes which do not have this flag set
+ * (The existing pg_index(indisvalid) field probably should not
+ * control whether VACUUM is allowed or not). Then we can do the
+ * catalog changes in separate transactions. Only the transaction that
+ * copies the heap would then use the PROC_IN_VACUUM flag. However,
+ * even then it would probably be appropriate to do regular
+ * (MVCC-safe) rewriting, i.e. avoid setting the xid of the rewriting
+ * transaction in the tuple headers.

Thanks for your testing!

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
v32-0001-Add-REPACK-command.patch text/x-diff 144.0 KB
v32-0002-Refactor-index_concurrently_create_copy-for-use-with.patch text/x-diff 8.7 KB
v32-0003-Move-conversion-of-a-historic-to-MVCC-snapshot-to-a-.patch text/x-diff 5.6 KB
v32-0004-Add-CONCURRENTLY-option-to-REPACK-command.patch text/plain 146.0 KB
v32-0005-Use-background-worker-to-do-logical-decoding.patch text/x-diff 65.2 KB
v32-0006-Use-multiple-snapshots-to-copy-the-data.patch text/plain 72.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Pavlo Golub 2026-01-27 10:54:48 Re: GSoC 2026: Call for Mentors, Project Ideas and Project Idea Reviews