Re: Patch: dumping tables data in multiple chunks in pg_dump

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Hannu Krosing <hannuk(at)google(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
Date: 2026-02-12 06:13:22
Message-ID: CAFiTN-s50GtFf650TT9Fko+q5rc+xwm+x106ugB3BE7_xgGjPQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 28, 2026 at 11:00 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> v13 has added a proper test comparing original and restored table data
>
I was reviewing v13 and here are some initial comments I have

1. IMHO the commit message details about the work progress instead of
a high level idea about what it actually does and how.
Suggestion:

SUBJECT: Add --max-table-segment-pages option to pg_dump for parallel
table dumping.

This patch introduces the ability to split large heap tables into segments
based on a specified number of pages. These segments can then be dumped in
parallel using the existing jobs infrastructure, significantly reducing
the time required to dump very large tables.

The implementation uses ctid-based range queries (e.g., WHERE ctid >=
'(start,1)'
AND ctid <= '(end,32000)') to extract specific chunks of the relation.

<more architecture details and limitation if any>

2.
+ pg_log_warning("CHUNKING: set dopt.max_table_segment_pages to [%u]",
dopt.max_table_segment_pages);
+ break;

IMHO we don't need to place warning here while processing the input parameters

3.
+ printf(_(" --max-table-segment-pages=NUMPAGES\n"
+ " Number of main table pages
above which data is \n"
+ " copied out in chunks, also
determines the chunk size\n"));

Check the comment formatting, all the parameter description starts
with lower case, so better we start with "number" rather than "Number"

4.
+ if (is_segment(tdinfo))
+ {
+ appendPQExpBufferStr(q, tdinfo->filtercond?" AND ":" WHERE ");
+ if(tdinfo->startPage == 0)
+ appendPQExpBuffer(q, "ctid <= '(%u,32000)'", tdinfo->endPage);
+ else if(tdinfo->endPage != InvalidBlockNumber)
+ appendPQExpBuffer(q, "ctid BETWEEN '(%u,1)' AND '(%u,32000)'",
+ tdinfo->startPage, tdinfo->endPage);
+ else
+ appendPQExpBuffer(q, "ctid >= '(%u,1)'", tdinfo->startPage);
+ pg_log_warning("CHUNKING: pages [%u:%u]",tdinfo->startPage, tdinfo->endPage);
+ }

IMHO we should explain this chunking logic in the comment above this code block?

--
Regards,
Dilip Kumar
Google

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2026-02-12 06:17:12 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Mark Wong 2026-02-12 05:57:00 Re: PL/Julia: clarification on IN array parameters issue