Re: Patch: dumping tables data in multiple chunks in pg_dump

From: Hannu Krosing <hannuk(at)google(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
Date: 2026-01-14 10:52:04
Message-ID: CAMT0RQT_yk32+ZyA-cBS9uMtAUOM-b+hqj0zT0Ge-ikFMrmw+A@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 13, 2026 at 3:27 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Fri, 14 Nov 2025 at 09:34, Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > Added to https://commitfest.postgresql.org/patch/6219/
>
> I think this could be useful, but I think you'll need to find a way to
> not do this for non-heap tables. Per the comments in TableAmRoutine,
> both scan_set_tidrange and scan_getnextslot_tidrange are optional
> callback functions and the planner won't produce TIDRangePaths if
> either of those don't exist. Maybe that means you need to consult
> pg_class.relam to ensure the amname is 'heap' or at least the relam =
> 2.

Makes sense, will add.

> On testing Citus's columnar AM, I get:
> postgres=# select * from t where ctid between '(0,1)' and '(10,0)';
> ERROR: UPDATE and CTID scans not supported for ColumnarScan

Should we just silently not chunk tables that have some storage
architecture that does not have tids, or should pg_dump just error out
in thiscase ?

I imagine the Citus columnar is often used with huge tables where
chunking would be most useful.

Later it likely makes sense to have another option for chunking other
types of tables, or maybe evan add something to the TableAM for
chunking support.

> 1. For the patch, I think you should tighten the new option up to mean
> the maximum segment size that a table will be dumped in. I see you
> have comments like:
>
> /* TODO: add hysteresis here, maybe < 1.1 * huge_table_chunk_pages */
>
> You *have* to put the cutoff *somewhere*, so I think it very much
> should be exactly the specified threshold. If anyone is unhappy that
> some segments consist of a single page, then that's on them to adjust
> the parameter accordingly. Otherwise, someone complaints that they got
> a 1-page segment when the table was 10.0001% bigger than the cutoff
> and then we're tempted to add a new setting to control the 1.1 factor,
> which is just silly. If there's a 1-page segment, so what? It's not a
> big deal.

Agreed, will drop the TODO

> Perhaps --max-table-segment-pages is a better name than
> --huge-table-chunk-pages as it's quite subjective what the minimum
> number of pages required to make a table "huge".

I agree. My initial thinking was that it is mainly useful for huge
tables, but indeed that does not need to be reflected in the flag name

> 2. I'm not sure if you're going to get away with using relpages for
> this. Is it really that bad to query pg_relation_size() when this
> option is set? If it really is a problem, then maybe let the user
> choose with another option. I understand we're using relpages for
> sorting table sizes so we prefer dumping larger tables first, but that
> just seems way less important if it's not perfectly accurate.

Yeah, I had thought of pg_relation_size() myself.

Another option would be something more complex which tries to estimate
the dump file sizes by figuring out TOAST for each chunk. The think
that makes this really complex is the possible uneven distribution of
toast and needing to take into account both the compression of toast
AND the compression of resulting dump file.

> 3. You should be able to simplify the code in dumpTableData() so
> you're not adding any extra cases. You could use InvalidBlockNumber to
> indicate an unbounded ctid range and only add ctid qual to the WHERE
> clause when you have a bounded range (i.e not InvalidBlockNumber).
> That way the first segment will need WHERE ctid <= '...' and the final
> one will need WHERE ctid >= '...'. Everything in between will have an
> upper and lower bound. That results in no ctid quals being added when
> both ranges are set to InvalidBlockNumber, which you should use for
> all tables not large enough to be segmented, thus no special case.

Makes sense, will look into it.

> TID Range scans are perfectly capable of working when only bounded at one side.
>
> 4. I think using "int" here is a future complaint waiting to happen.
>
> + if (!option_parse_int(optarg, "--huge-table-chunk-pages", 1, INT32_MAX,
> + &dopt.huge_table_chunk_pages))
>
> I bet we'll eventually see a complaint that someone can't make the
> segment size larger than 16TB. I think option_parse_uint32() might be
> called for.

There can be no more than 2 * INT2_MAX pages anyway.
I thought half of the max possible size should be enough.
Do you really think that somebody would want that ?

> David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2026-01-14 10:52:54 Re: Patch: dumping tables data in multiple chunks in pg_dump
Previous Message Anthonin Bonnefoy 2026-01-14 10:42:31 Re: Always show correct error message for statement timeouts, fixes random buildfarm failures