Re: Adding REPACK [concurrently]

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>
Cc: Robert Treat <rob(at)xzilla(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>
Subject: Re: Adding REPACK [concurrently]
Date: 2025-08-30 17:50:56
Message-ID: 202508301750.cbohxyy2pcce@alvherre.pgsql
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Here's v19 of this patchset. This is mostly Antonin's v18. I added a
preparatory v19-0001 commit, which splits vacuumdb.c to create a new
file, vacuuming.c (and its header file vacuuming.h). If you look at it
under 'git show --color-moved=zebra' you should notice that most of it
is just code movement; there's hardly any code changes.

v19-0002 has absorbed Antonin's v18-0005 (the pg_repackdb binary)
together with the introduction of the REPACK command proper; but instead
of using a symlink, I just created a separate pg_repackdb.c source file
for it and we compile that small new source file with vacuuming.c to
create a regular binary. BTW the meson.build changes look somewhat
duplicative; maybe there's a less dumb way to go about this. (For
instance, maybe just have libscripts.a include vacuuming.o, though it's
not used by any of the other programs in that subdir.)

I'm not wedded to the name "vacuuming.c"; happy to take suggestions.

After 0002, the pg_repackdb utility should be ready to take clusterdb's
place, and also vacuumdb --full, with one gotcha: if you try to use
pg_repackdb with an older server version, it will fail, claiming that
REPACK is not supported. This is not ideal. Instead, we should make it
run VACUUM FULL (or CLUSTER); so if you have a fleet including older
servers you can use the new utils there too.

All the logic for vacuumdb to select tables to operate on has been moved
to vacuuming.c verbatim. This means this logic applies to pg_repackdb
as well. As long as you stick to repacking a single table this is okay
(read: it won't be used at all), but if you want to use parallel mode
(say to process multiple schemas), we might need to change it. For the
same reason, I think we should add an option to it (--index[=indexname])
to select whether to use the USING INDEX clause or not, and optionally
indicate which index to use; right now there's no way to select which
logic (cluster's or vacuum full's) to use.

Then v19-0003 through v19-0005 are Antonin's subsequent patches to add
the CONCURRENTLY option; I have not reviewed these at all, so I'm
including them here just for completion. I also included v18-0006 as
posted by Mihail previously, though I have little faith that we're going
to include it in this release.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)

Attachment Content-Type Size
v19-0001-Split-vacuumdb-to-create-vacuuming.c-h.patch text/x-diff 69.3 KB
v19-0002-Add-REPACK-command.patch text/x-diff 133.3 KB
v19-0003-Refactor-index_concurrently_create_copy-for-use-.patch text/x-diff 4.1 KB
v19-0004-Move-conversion-of-a-historic-to-MVCC-snapshot-t.patch text/x-diff 5.4 KB
v19-0005-Add-CONCURRENTLY-option-to-REPACK-command.patch text/x-diff 147.4 KB
v19-0006-Preserve-visibility-information-of-the-concurren.patch text/x-diff 56.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2025-08-30 18:14:29 Re: COPY TO: provide hint when WHERE clause is used
Previous Message Arseniy Mukhin 2025-08-30 17:33:09 Move block_range_read_stream_cb batchmode comment