Quick Links

Re: Patch: dumping tables data in multiple chunks in pg_dump

From:	Hannu Krosing <hannuk(at)google(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject:	Re: Patch: dumping tables data in multiple chunks in pg_dump
Date:	2025-11-13 18:39:26
Message-ID:	CAMT0RQQAH1a8kY-mx7B07Uzn3T_zeaU9detqFFtW36_k67Su+A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Going up to 16 workers did not improve performance , but this is
expected, as the disk behind the database can only do 4TB/hour of
reads, which is now the bottleneck. (408/352/*3600 = 4172 GB/h)

$ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=131072 -j 16 -f /tmp/parallel16.dump largedb
real 5m44.900s
user 53m50.491s
sys 5m47.602s

And 4 workers showed near-linear speedup from single worker

hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
--format=directory -h 10.58.80.2 -U postgres
--huge-table-chunk-pages=131072 -j 4 -f /tmp/parallel4.dump largedb
real 10m32.074s
user 38m54.436s
sys 2m58.216s

The database runs on a 64vCPU VM with 128GB RAM, so most of the table
will be read in from the disk

On Thu, Nov 13, 2025 at 7:02 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> I just ran a test by generating a 408GB table and then dumping it both ways
>
> $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
> /tmp/plain.dump largedb
>
> real 39m54.968s
> user 37m21.557s
> sys 2m32.422s
>
> $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
>
> real 5m52.965s
> user 40m27.284s
> sys 3m53.339s
>
> So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> almost 7 times faster than the sequential dump.
>
> this was a table that had no TOAST part. I will run some more tests
> with TOASTed tables next and expect similar or better improvements.
>
>
>
> On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> > Hi Hannu,
> >
> > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > >
> > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > >
> > > Looking for feedback at this point:
> > > 1) what have I missed
> > > 2) should I implement something to avoid single-page chunks
> > >
> > > The flag --huge-table-chunk-pages which tells the directory format
> > > dump to dump tables where the main fork has more pages than this in
> > > multiple chunks of given number of pages,
> > >
> > > The main use case is speeding up parallel dumps in case of one or a
> > > small number of HUGE tables so parts of these can be dumped in
> > > parallel.
> >
> > Have you measured speed up? Can you please share the numbers?
> >
> > --
> > Best Wishes,
> > Ashutosh Bapat

In response to

Re: Patch: dumping tables data in multiple chunks in pg_dump at 2025-11-13 18:02:43 from Hannu Krosing

Responses

Re: Patch: dumping tables data in multiple chunks in pg_dump at 2025-11-13 20:24:33 from Hannu Krosing

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Vitaly Davydov	2025-11-13 19:58:16	Re: Newly created replication slot may be invalidated by checkpoint
Previous Message	Rustam Khamidullin	2025-11-13 18:29:54	Re: [PATCH] Speed up of vac_update_datfrozenxid.