Re: Proposal: Support Logical replication of large objects

From: Nitin Motiani <nitinmotiani(at)google(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Proposal: Support Logical replication of large objects
Date: 2026-05-04 04:53:44
Message-ID: CAH5HC946CgCSP8X+3rJUvhUz0kuQS2hbC-GLphNQog_FakAe4Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> 4. If we want a more performant version, one idea is to support bulk
> writes for large objects. Then the above solution can be made more
> performant. I have not analyzed the work required. Suggestions on this
> would be welcome.
>

I experimented with the bulk write idea. I've created a couple of
patches for it. One implements lo_bulk_write and lo_bulk_put APIs. The
other patch has the benchmark script to test out these APIs and shows
3x improvement.

Note I have not yet integrated this with the remaining replication
work. That might also require a bulk read or copy API. I have worked
on the bulk API independently of this patch and will look into
integration if the general idea looks good. The bulk write (and bulk
read) could also be useful for pg_dump and pg_restore. So I might move
those to a different thread later. Currently I've reordered the
patches and 0001 and 0002 are the bulk write and benchmark patch. And
the original 4 patches for replication are now 0003-0006. Let me know
if I should change this ordering.

Here are the major changes for bulk write.

1. I introduced two new functions in be-fsstubs.c:
* lo_bulk_write(fds int4[], data bytea[])
* lo_bulk_put(oids oid[], data bytea[])

2. I created inv_bulk_write, inv_bulk_create, and inv_bulk_put
functions to do the actual work.

3. I use table_multi_insert to perform the actual bulk writes. The
index updates are still done one by one.

4. This optimization is limited to creating new objects (using
lo_bulk_put) or appending to existing objects (lo_bulk_write). If
lo_bulk_writes tries to insert in the middle of an existing object, we
fall back to the standard inv_write instead of using `inv_bulk_write`.
We check this using the large object's current size and by ensuring
`offset % LOBLKSIZE` is zero, which prevents starting a bulk write in
the middle of a page.

5. I've added a few regress tests for these cases and benchmark tests
which run multiple transactions of 1000 inserts each (bulk vs
one-by-one insert). The bulk insert shows a 3x improvement. I think it
might be bigger when a remote client is involved and network overhead
is added. But I've not looked into it in too much detail.

Let me know what you think.

Thanks & Regards

Nitin Motiani,
Google

Attachment Content-Type Size
POC-v4-0005-Tablesync-for-large-objects-for-superuser.patch application/x-patch 2.4 KB
POC-v4-0006-Enable-tablesync-for-large-objects-for-all-users.patch application/x-patch 5.6 KB
POC-v4-0003-Support-large-object-decoding.patch application/x-patch 37.9 KB
v4-0001-Implement-bulk-API-for-Large-Objects.patch application/x-patch 32.2 KB
POC-v4-0004-Add-support-to-apply-lo_write-operations-in-apply.patch application/x-patch 3.8 KB
v4-0002-Add-pgbench-scripts-for-Bulk-API-performance-test.patch application/x-patch 69.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2026-05-04 05:05:10 RE: StringInfo fixes, v19 edition. Plus a few oddities
Previous Message SATYANARAYANA NARLAPURAM 2026-05-04 04:53:04 Re: Report index currently being vacuumed in pg_stat_progress_vacuum