Re: Adding REPACK [concurrently]

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject: Re: Adding REPACK [concurrently]
Date: 2026-03-31 14:36:22
Message-ID: 228982.1774967782@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Fri, Mar 27, 2026 at 10:31 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >
> > Lastly, 0008 is "Teach snapshot builder to skip transactions running
> > REPACK (CONCURRENTLY)" which I see as the least mature of the pack. I
> > would really like to be able to squash it with 0003, but I'm not yet
> > trusting it enough.
> >
>
> Few comments/questions by looking 0008 alone:

Thanks.

> 1.
> + TransactionId *xids_repack = NULL;
> + bool logical_decoding_enabled = IsLogicalDecodingEnabled();
>
> Assert(!RecoveryInProgress());
>
> ...
> ...
>
> /*
> * Allocating space for maxProcs xids is usually overkill; numProcs would
> * be sufficient. But it seems better to do the malloc while not holding
> @@ -2663,11 +2672,14 @@ GetRunningTransactionData(void)
> */
> if (CurrentRunningXacts->xids == NULL)
> {
> + /* FIXME probably fails if logical decoding is enable on-the-fly */
> + int nrepack = logical_decoding_enabled ? MAX_REPACK_XIDS : 0;
>
> This FIXME is important to fix for this patch, otherwise, we can't
> rely on transactions remembered as repack_concurrently.

Indeed.

> 2.
> *
> + /*
> + * TODO Consider a GUC to reserve certain amount of replication slots for
> + * REPACK (CONCURRENTLY) and using it here.
> + */
> +#define MAX_REPACK_XIDS 16
> +
>
> This sounds a bit scary as reserving replication slots for REPACK
> (CONCURRENTLY) may not be what users expect. But it is not clear why
> replication slots need to be reserved for this.

The point is that REPACK should not block replication [1]. Thus reserving
slots for non-REPACK users is probably more precise statement.

> IIUC, two reasons why logical decoding can ignore REPACK
> (CONCURRENTLY) are (a) does not perform any catalog changes relevant
> to logical decoding, (b) neither walsenders nor backends performing
> logical decoding needs to care sending the WAL generated by REPACK
> (CONCURRENTLY). Is that understanding correct? If so, we may want to
> clarify why we want to ignore this command's WAL while sending changes
> downstream in the commit message or give some reference of the patch
> where the same is mentioned. This can help reviewing this patch
> independently.

Correct, but in fact this diff only affects the setup of the logical decoding
rather than the decoding itself. On the other hand, if REPACK (CONCURRENTLY)
starts when the decoding backend's snapshot builder is already in the
SNAPBUILD_FULL_SNAPSHOT state, reorderbuffer.c processes the transaction
normally, and another part of the series (v46-0002) ensures that the table
rewriting is not decoded: REPACK simply tells heap_insert(), heap_update(),
heap_delete() not to put the extra (replication specific) information into the
corresponding WAL records. I suppose this is what you mean in (b).

Regarding (a), yes, the absence of catalog changes in the REPACK's transaction
is the reason that even the logical decoding setup does not have to wait for
the transaction to finish. AFAIU the reason the snapshot builder needs to wait
for completion of (non-REPACK) transaction started before
SNAPBUILD_FULL_SNAPSHOT was reached is exactly that the transaction might have
performed catalog changes before its decoding started, so we do not know for
sure if it contains catalog changes or not.

> BTW, are we intending to commit this patch series for PG19?

Yes, that's the current plan.

[1] https://www.postgresql.org/message-id/CABV9wwMQkN6wOxMnd1h95eqpC7wEqivBzsBzCp3VnxGFk%3DvDUw%40mail.gmail.com

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-03-31 14:43:00 Re: jsonb subscripting vs SQL/JSON array accessor semantics (SQL:2023)
Previous Message Yugo Nagata 2026-03-31 14:35:17 Re: Allow to collect statistics on virtual generated columns