| From: | Antonin Houska <ah(at)cybertec(dot)at> |
|---|---|
| To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
| Cc: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2026-02-25 08:55:59 |
| Message-ID: | 9116.1772009759@localhost |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2026-Feb-23, Alvaro Herrera wrote:
>
> Looking at this function in pgoutput_repack.c:
>
> > +/* Store concurrent data change. */
> > +static void
> > +store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind,
> > + HeapTuple tuple)
> > +{
>
> [...] we have this:
>
> > + size = VARHDRSZ + SizeOfConcurrentChange;
> > +
> > + /*
> > + * ReorderBufferCommit() stores the TOAST chunks in its private memory
> > + * context and frees them after having called apply_change(). Therefore
> > + * we need flat copy (including TOAST) that we eventually copy into the
> > + * memory context which is available to decode_concurrent_changes().
> > + */
> > + if (HeapTupleHasExternal(tuple))
> > + {
> > + /*
> > + * toast_flatten_tuple_to_datum() might be more convenient but we
> > + * don't want the decompression it does.
> > + */
> > + tuple = toast_flatten_tuple(tuple, dstate->tupdesc);
> > + flattened = true;
> > + }
> > +
> > + size += tuple->t_len;
> > + if (size >= MaxAllocSize)
> > + elog(ERROR, "Change is too big.");
> > +
> > + /* Construct the change. */
> > + change_raw = (char *) palloc0(size);
> > + SET_VARSIZE(change_raw, size);
In 0005 ("Use background worker to do logical decoding"), the function is a
bit simpler because here the decoding worker uses temporary file to send the
data to the REPACKing backend, rather than tuplestore. sharedtuplestore.h
would also work but I think we do not need its functionality, and AFAICS it
always writes the data into a file anyway (i.e. it does not use memory even if
the amount of data is small).
(Perhaps 0004 should use the file too, in case 0005 does not make it into PG
19.)
> I wonder if this isn't problematic with large tuples. If a row has some
> very wide columns, each of which individually is less than 1 GB, then it
> might happen that the sum of their sizes exceeds 1 GB, causing palloc()
> to complain and abort the whole repack operation. This wouldn't be very
> nice, so I think we need to address it somehow.
I agree.
> I think we need some new APIs to avoid all this copying. It appears
> that it all starts with reorderbuffer doing something unhelpful with the
> memory context of the TOAST chunks. Maybe we should address this by
> "fixing" reorderbuffer so that it doesn't do this, instead of playing so
> many games to cope.
What I see is that reorderbuffer.c collects the TOAST pointers
(ReorderBufferToastAppendChunk) and then, before passing the tuple to the
output plugin, it copies the TOAST chunks referenced by the tuple to memory
and replaces the "on-disk" TOAST pointers in the tuple with "external
indirect" ones, pointing to the in-memory TOAST chunks
(ReorderBufferToastReplace).
For REPACK, I suggest a variant of toast_flatten_tuple() that writes the
output to a file, and a corresponding function that reads it while allocating
separate chunks of memory for the individual TOASTed attributes - the restored
tuple would reference the chunks using the "external indirect" TOAST pointers,
as if it had been processed by ReorderBufferToastReplace(). Does that make
sense to you?
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-02-25 09:15:16 | Re: Add errdetail() with PID and UID about source of termination signal |
| Previous Message | Chao Li | 2026-02-25 08:41:49 | Re: pg_basebackup: removed an unnecessary use of memset in FindStreamingStart |