Quick Links

Re: Adding REPACK [concurrently]

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net>
Subject:	Re: Adding REPACK [concurrently]
Date:	2026-02-24 18:29:17
Message-ID:	202602241757.6ac3iss2u4vo@alvherre.pgsql
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2026-Feb-23, Alvaro Herrera wrote:

Looking at this function in pgoutput_repack.c:

> +/* Store concurrent data change. */
> +static void
> +store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
> + HeapTuple tuple)
> +{

[...] we have this:

> + size = VARHDRSZ + SizeOfConcurrentChange;
> +
> + /*
> + * ReorderBufferCommit() stores the TOAST chunks in its private memory
> + * context and frees them after having called apply_change(). Therefore
> + * we need flat copy (including TOAST) that we eventually copy into the
> + * memory context which is available to decode_concurrent_changes().
> + */
> + if (HeapTupleHasExternal(tuple))
> + {
> + /*
> + * toast_flatten_tuple_to_datum() might be more convenient but we
> + * don't want the decompression it does.
> + */
> + tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
> + flattened = true;
> + }
> +
> + size += tuple->t_len;
> + if (size >= MaxAllocSize)
> + elog(ERROR, "Change is too big.");
> +
> + /* Construct the change. */
> + change_raw = (char *) palloc0(size);
> + SET_VARSIZE(change_raw, size);

I wonder if this isn't problematic with large tuples. If a row has some
very wide columns, each of which individually is less than 1 GB, then it
might happen that the sum of their sizes exceeds 1 GB, causing palloc()
to complain and abort the whole repack operation. This wouldn't be very
nice, so I think we need to address it somehow.

Another thing I'm not very keen on, is the fact that we have to memcpy()
the tuple contents a few lines below:

> + /*
> + * Copy the tuple.
> + *
> + * Note: change->tup_data.t_data must be fixed on retrieval!
> + */
> + memcpy(&change.tup_data, tuple, sizeof(HeapTupleData));
> + memcpy(dst, &change, SizeOfConcurrentChange);
> + dst += SizeOfConcurrentChange;
> + memcpy(dst, tuple->t_data, tuple->t_len);

> + /* Store as tuple of 1 bytea column. */
> + values[0] = PointerGetDatum(change_raw);
> + isnull[0] = false;
> + tuplestore_putvalues(dstate->tstore, dstate->tupdesc_change,
> + values, isnull);

To make matters worse, tuplestore_putvalues does a
heap_form_minimal_tuple() on this and copies the data again. This seems
pretty wasteful.

I think we need some new APIs to avoid all this copying. It appears
that it all starts with reorderbuffer doing something unhelpful with the
memory context of the TOAST chunks. Maybe we should address this by
"fixing" reorderbuffer so that it doesn't do this, instead of playing so
many games to cope.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

In response to

Re: Adding REPACK [concurrently] at 2026-02-23 18:23:52 from Alvaro Herrera

Responses

Re: Adding REPACK [concurrently] at 2026-02-25 08:55:59 from Antonin Houska

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zsolt Parragi	2026-02-24 18:33:19	Re: More speedups for tuple deformation
Previous Message	Joel Jacobson	2026-02-24 18:21:06	[BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq