| From: | Nitin Motiani <nitinmotiani(at)google(dot)com> |
|---|---|
| To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Cc: | Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Proposal: Support Logical replication of large objects |
| Date: | 2026-05-04 04:53:44 |
| Message-ID: | CAH5HC946CgCSP8X+3rJUvhUz0kuQS2hbC-GLphNQog_FakAe4Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
>
> 4. If we want a more performant version, one idea is to support bulk
> writes for large objects. Then the above solution can be made more
> performant. I have not analyzed the work required. Suggestions on this
> would be welcome.
>
I experimented with the bulk write idea. I've created a couple of
patches for it. One implements lo_bulk_write and lo_bulk_put APIs. The
other patch has the benchmark script to test out these APIs and shows
3x improvement.
Note I have not yet integrated this with the remaining replication
work. That might also require a bulk read or copy API. I have worked
on the bulk API independently of this patch and will look into
integration if the general idea looks good. The bulk write (and bulk
read) could also be useful for pg_dump and pg_restore. So I might move
those to a different thread later. Currently I've reordered the
patches and 0001 and 0002 are the bulk write and benchmark patch. And
the original 4 patches for replication are now 0003-0006. Let me know
if I should change this ordering.
Here are the major changes for bulk write.
1. I introduced two new functions in be-fsstubs.c:
* lo_bulk_write(fds int4[], data bytea[])
* lo_bulk_put(oids oid[], data bytea[])
2. I created inv_bulk_write, inv_bulk_create, and inv_bulk_put
functions to do the actual work.
3. I use table_multi_insert to perform the actual bulk writes. The
index updates are still done one by one.
4. This optimization is limited to creating new objects (using
lo_bulk_put) or appending to existing objects (lo_bulk_write). If
lo_bulk_writes tries to insert in the middle of an existing object, we
fall back to the standard inv_write instead of using `inv_bulk_write`.
We check this using the large object's current size and by ensuring
`offset % LOBLKSIZE` is zero, which prevents starting a bulk write in
the middle of a page.
5. I've added a few regress tests for these cases and benchmark tests
which run multiple transactions of 1000 inserts each (bulk vs
one-by-one insert). The bulk insert shows a 3x improvement. I think it
might be bigger when a remote client is involved and network overhead
is added. But I've not looked into it in too much detail.
Let me know what you think.
Thanks & Regards
Nitin Motiani,
Google
| Attachment | Content-Type | Size |
|---|---|---|
| POC-v4-0005-Tablesync-for-large-objects-for-superuser.patch | application/x-patch | 2.4 KB |
| POC-v4-0006-Enable-tablesync-for-large-objects-for-all-users.patch | application/x-patch | 5.6 KB |
| POC-v4-0003-Support-large-object-decoding.patch | application/x-patch | 37.9 KB |
| v4-0001-Implement-bulk-API-for-Large-Objects.patch | application/x-patch | 32.2 KB |
| POC-v4-0004-Add-support-to-apply-lo_write-operations-in-apply.patch | application/x-patch | 3.8 KB |
| v4-0002-Add-pgbench-scripts-for-Bulk-API-performance-test.patch | application/x-patch | 69.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hayato Kuroda (Fujitsu) | 2026-05-04 05:05:10 | RE: StringInfo fixes, v19 edition. Plus a few oddities |
| Previous Message | SATYANARAYANA NARLAPURAM | 2026-05-04 04:53:04 | Re: Report index currently being vacuumed in pg_stat_progress_vacuum |