Quick Links

Re: Patch: dumping tables data in multiple chunks in pg_dump

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject:	Re: Patch: dumping tables data in multiple chunks in pg_dump
Date:	2025-11-24 21:02:15
Message-ID:	CAMT0RQQPMj3=EZ-4z6qRs_TmBHoyv2VHAdMrfDuwa5ZUY6XtHQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

The expectation was that as chunking is useful mainly in case of
really huge tables the analyze should have been run "recently enough".

Maybe we should use pg_relation_size() in case we have already
determined that the table is large enough to warrant chunking? Maybe
at least 1/2 of the requested chunk size?

My reasoning was to not put too much extra load on pg_dump in case
chunking is not required. But of course we can use the presence of a
chunking request to decide to run pg_relation_size(), assuming the
overhead won't be too large in this case.

On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> >
> > Attached is a patch that adds the ability to dump table data in multiple chunks.
> >
> > Looking for feedback at this point:
> > 1) what have I missed
> > 2) should I implement something to avoid single-page chunks
> >
> > The flag --huge-table-chunk-pages which tells the directory format
> > dump to dump tables where the main fork has more pages than this in
> > multiple chunks of given number of pages,
> >
> > The main use case is speeding up parallel dumps in case of one or a
> > small number of HUGE tables so parts of these can be dumped in
> > parallel.
> >
>
> +1 for the idea, I haven't done the detailed review but I was just
> going through the patch, I noticed that we use pg_class->relpages to
> identify whether to chunk the table or not, which should be fine but
> don't you think if we use direct size calculation function like
> pg_relation_size() we might get better idea and not dependent upon
> whether the stats are updated or not? This will make chunking
> behavior more deterministic.
>
> --
> Regards,
> Dilip Kumar
> Google

In response to

Re: Patch: dumping tables data in multiple chunks in pg_dump at 2025-11-17 04:15:17 from Dilip Kumar

Responses

Re: Patch: dumping tables data in multiple chunks in pg_dump at 2025-11-25 04:50:01 from Dilip Kumar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2025-11-24 21:02:18	Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent
Previous Message	Andres Freund	2025-11-24 20:57:59	Re: Buffer locking is special (hints, checksums, AIO writes)