| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Multi-insert for logical replication apply |
| Date: | 2026-06-29 21:13:00 |
| Message-ID: | CALj2ACXGhzRSVpMwkFX0mwvkhDdFwnPLzTXYrtRFHcGGxo6XhA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
I would like to propose using multi-inserts for logical replication apply
for PG20. The attached patch is a WIP and I'm looking for feedback on the
overall approach.
The logical replication apply worker currently inserts tuples one at a
time. Each row writes its own WAL record and goes through a page
lock/unlock cycle, so a query that inserts 100M rows on the publisher turns
into 100M single-row inserts on the subscriber. The patch buffers
consecutive inserts within a transaction on the subscriber and applies them
with the existing multi-insert table AM, the same way COPY does. When
several tuples fit on one page, that writes a single WAL record for all of
them and locks the page once instead of once per row.
I ran a few insert-only experiments to see how much this helps
(r7i.8xlarge, shared_buffers=16GB, streaming=parallel, checkpoints
disabled). I also measured the time spent in the insert call itself, apart
from total apply time:
row size rows WAL H WAL P WAL gain WAL records H WAL
records P records gain insert time H insert time P insert gain
apply time H apply time P apply gain
---------- ----- -------- -------- -------- -------------
------------- ------------ ------------- ------------- -----------
------------ ------------ ----------
50 B/row 100M 7.5GB 2.9GB 2.6x 101060526 1798770
56x 61s 15s 4.0x 487s
469s 1.04x
128 B/row 100M 14.6GB 10.1GB 1.4x 102643991 4512768
23x 138s 92s 1.5x 508s
479s 1.06x
1.4 KB/row 10M 13.2GB 12.8GB 1.03x 10001543 2164961
4.6x 317s 309s 1.03x 334s
319s 1.05x
H = HEAD (single-insert), P = PATCHED (multi-insert). Gain = H / P.
The narrower the row, the more rows fit on a page and the bigger the
saving. The WAL reduction is the main benefit - less I/O, fewer WAL files,
faster crash recovery, and less WAL to ship for cascading replication. The
apply-time change is small (under 5%). My hypothesis is that the other
per-row work in the apply path outweighs the WAL write, so fewer WAL
records alone doesn't move it much. I'll also test with streaming=off to
see how that compares. (I also have small-scale results from 1 to 100K rows
showing the gain grows with row count; happy to share if useful.)
For some background, I proposed multi-insert for CREATE TABLE AS and
materialized views earlier [1]. That ran into two problems. The slot type
there isn't known ahead of time because the top plan node decides it, and I
had also bundled in table AM interface changes, which widened the scope.
Logical replication apply avoids both - the slot type is always
TTSOpsVirtual, and the existing multi-insert table AM API can be used as-is.
Here are the main design decisions I'd like feedback on.
It reuses COPY's existing checks for when multi-insert isn't safe, falling
back to the single-row path in the same cases: before/instead-of row
triggers, volatile defaults, foreign tables/partitions, and so on.
The patch buffers consecutive inserts to the same relation, in both leader
and parallel apply workers. It flushes at 1000 tuples or 64KB (adjustable
or configurable later), on a switch to a different relation, or when any
non-insert message arrives, which keeps ordering correct in mixed
transactions. The first insert always takes the existing single-row path;
it switches to multi-insert only after consecutive inserts to the same
relation. Inserts into several relations in turn flush on every switch;
per-relation buffers could avoid that, but I've left that out for now.
Only the heap write goes through multi-insert. Constraint checks, index
maintenance, conflict detection, and after-row triggers still run per row,
exactly as the single-row path does, so visible behavior is unchanged.
Constraints are evaluated per row before the multi-insert. Pushing these
passes into the heap AM layer would avoid the extra loops but needs table
AM changes I'd rather not get into here.
On conflict detection: today an insert conflict raises an ERROR, so a
conflict on a buffered row rolls back the whole batch. Per-row resolutions
(ON CONFLICT DO NOTHING/DO UPDATE style) would need more thought, since
rows land in the heap before the per-row check runs.
Currently the subscriber batches single inserts (this patch) and does a
multi-insert. Another idea is to add a new message type for multi-insert on
the publisher when decoding XLOG_HEAP2_MULTI_INSERT, with the subscriber
using multi-insert only on that path. That keeps things simpler, but it
only kicks in when the publisher used multi-insert itself (e.g. COPY FROM),
and it requires protocol changes.
This is still a WIP and I've likely missed some edge cases, so I appreciate
feedback on the overall approach. Thank you!
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-WIP-Multi-insert-for-logical-replication-apply.patch | application/x-patch | 15.0 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Álvaro Herrera | 2026-06-29 21:13:02 | Re: pg_createsubscriber --dry-run logging concerns |
| Previous Message | Tom Lane | 2026-06-29 20:22:06 | Re: PostgreSQL select-only CTE removal is too aggressive? |