Patch: dumping tables data in multiple chunks in pg_dump

From: Hannu Krosing <hannuk(at)google(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Patch: dumping tables data in multiple chunks in pg_dump
Date: 2025-11-11 15:29:56
Message-ID: CAMT0RQT_0qVxcTT6ycM20QUN-pEQ6iMLbz6gLWgLpeF0NmNOUA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a patch that adds the ability to dump table data in multiple chunks.

Looking for feedback at this point:
1) what have I missed
2) should I implement something to avoid single-page chunks

The flag --huge-table-chunk-pages which tells the directory format
dump to dump tables where the main fork has more pages than this in
multiple chunks of given number of pages,

The main use case is speeding up parallel dumps in case of one or a
small number of HUGE tables so parts of these can be dumped in
parallel.

It will also help in case the target file system has some limitations
on file sizes (4GB for FAT, 5TB for GCS).

Currently no tests are included in the patch and also no extra
documentation outside what is printed out by pg_dump --help . Also any
pg_log_warning lines with "CHUNKING" is there for debugging and needs
to be removed before committing.

As implemented no changes are needed for pg_restore as all chunks are
already associated with the table in .toc and thus are restored into
this table

the attached README shows how I verified it works and the textual
file created from the directory format dump in the last step there

--
Hannu

Attachment Content-Type Size
0001-adds-ability-to-dump-data-for-tables-in-multiple-chu.patch application/x-patch 11.5 KB
README.pg_dump.md text/markdown 3.7 KB
dump.sql application/sql 56.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-11-11 15:34:10 Re: Document NULL
Previous Message Fujii Masao 2025-11-11 15:22:38 Re: Suggestion to add --continue-client-on-abort option to pgbench