Re: Patch: dumping tables data in multiple chunks in pg_dump

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
Date: 2025-11-13 20:26:46
Message-ID: CAMT0RQSNHFffbCmDNxQogVBD8H5gTDJNwhUR2btCVE+Lq1sGGw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The reason for small chunk sizes is that they are determined by main
heap table, and that was just over 1GB

largetoastdb=> SELECT format('%I.%I', t.schemaname, t.relname) as table_name,
pg_table_size(t.relid) AS table_size,
sum(pg_relation_size(i.indexrelid)) AS total_index_size,
pg_relation_size(t.relid) AS main_table_size,
pg_relation_size(c.reltoastrelid) AS toast_table_size,
pg_relation_size(oi.indexrelid) AS toast_index_size,
t.n_live_tup AS row_count,
count(*) AS index_count,
array_to_json(array_agg(json_build_object(i.indexrelid::regclass,
pg_relation_size(i.indexrelid))), true) AS index_info
FROM pg_stat_user_tables t
JOIN pg_stat_user_indexes i ON i.relid = t.relid
JOIN pg_class c ON c.oid = t.relid
LEFT JOIN pg_stat_sys_indexes AS oi ON oi.relid = c.reltoastrelid
GROUP BY 1, 2, 4, 5, 6, 7
ORDER BY 2 DESC, 7 DESC
LIMIT 25;
┌─[ RECORD 1 ]─────┬─────────────────────────────────────┐
│ table_name │ public.just_toasted │
│ table_size │ 56718835712 │
│ total_index_size │ 230064128 │
│ main_table_size │ 1191559168 │
│ toast_table_size │ 54613336064 │
│ toast_index_size │ 898465792 │
│ row_count │ 5625234 │
│ index_count │ 1 │
│ index_info │ [{"just_toasted_pkey" : 230064128}] │
└──────────────────┴─────────────────────────────────────┘

On Thu, Nov 13, 2025 at 9:24 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> Ran another test with a 53GB database where most of the data is in TOAST
>
> CREATE TABLE just_toasted(
> id serial primary key,
> toasted1 char(2200) STORAGE EXTERNAL,
> toasted2 char(2200) STORAGE EXTERNAL,
> toasted3 char(2200) STORAGE EXTERNAL,
> toasted4 char(2200) STORAGE EXTERNAL
> );
>
> and the toast fields were added in somewhat randomised order.
>
> Here the results are as follows
>
> Parallelism | chunk size (pages) | time (sec)
> 1 | - | 240
> 2 | 1000 | 129
> 4 | 1000 | 64
> 8 | 1000 | 36
> 16 | 1000 | 30
>
> 4 | 9095 | 78
> 8 | 9095 | 42
> 16 | 9095 | 42
>
> The reason larger chunk sizes performed worse was that they often had
> one or two stragglers left behind which
>
> Detailed run results below:
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres -f
> /tmp/ltoastdb-1-plain.dump largetoastdb
> real 3m59.465s
> user 3m43.304s
> sys 0m15.844s
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=9095 -j 4 -f /tmp/ltoastdb-4.dump
> largetoastdb
> real 1m18.320s
> user 3m49.236s
> sys 0m19.422s
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=9095 -j 8 -f /tmp/ltoastdb-8.dump
> largetoastdb
> real 0m42.028s
> user 3m55.299s
> sys 0m24.657s
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=9095 -j 16 -f /tmp/ltoastdb-16.dump
> largetoastdb
> real 0m42.575s
> user 4m11.011s
> sys 0m26.110s
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 16 -f /tmp/ltoastdb-16-1kpages.dump
> largetoastdb
> real 0m29.641s
> user 6m16.321s
> sys 0m49.345s
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 8 -f /tmp/ltoastdb-8-1kpages.dump
> largetoastdb
> real 0m35.685s
> user 3m58.528s
> sys 0m26.729s
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 4 -f /tmp/ltoastdb-4-1kpages.dump
> largetoastdb
> real 1m3.737s
> user 3m50.251s
> sys 0m18.507s
>
> hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> --format=directory -h 10.58.80.2 -U postgres
> --huge-table-chunk-pages=1000 -j 2 -f /tmp/ltoastdb-2-1kpages.dump
> largetoastdb
> real 2m8.708s
> user 3m57.018s
> sys 0m18.499s
>
> On Thu, Nov 13, 2025 at 7:39 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> >
> > Going up to 16 workers did not improve performance , but this is
> > expected, as the disk behind the database can only do 4TB/hour of
> > reads, which is now the bottleneck. (408/352/*3600 = 4172 GB/h)
> >
> > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=131072 -j 16 -f /tmp/parallel16.dump largedb
> > real 5m44.900s
> > user 53m50.491s
> > sys 5m47.602s
> >
> > And 4 workers showed near-linear speedup from single worker
> >
> > hannuk(at)pgn2:~/work/postgres/src/bin/pg_dump$ time ./pg_dump
> > --format=directory -h 10.58.80.2 -U postgres
> > --huge-table-chunk-pages=131072 -j 4 -f /tmp/parallel4.dump largedb
> > real 10m32.074s
> > user 38m54.436s
> > sys 2m58.216s
> >
> > The database runs on a 64vCPU VM with 128GB RAM, so most of the table
> > will be read in from the disk
> >
> >
> >
> >
> >
> >
> > On Thu, Nov 13, 2025 at 7:02 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > >
> > > I just ran a test by generating a 408GB table and then dumping it both ways
> > >
> > > $ time pg_dump --format=directory -h 10.58.80.2 -U postgres -f
> > > /tmp/plain.dump largedb
> > >
> > > real 39m54.968s
> > > user 37m21.557s
> > > sys 2m32.422s
> > >
> > > $ time ./pg_dump --format=directory -h 10.58.80.2 -U postgres
> > > --huge-table-chunk-pages=131072 -j 8 -f /tmp/parallel8.dump largedb
> > >
> > > real 5m52.965s
> > > user 40m27.284s
> > > sys 3m53.339s
> > >
> > > So parallel dump with 8 workers using 1GB (128k pages) chunks runs
> > > almost 7 times faster than the sequential dump.
> > >
> > > this was a table that had no TOAST part. I will run some more tests
> > > with TOASTed tables next and expect similar or better improvements.
> > >
> > >
> > >
> > > On Wed, Nov 12, 2025 at 1:59 PM Ashutosh Bapat
> > > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > > >
> > > > Hi Hannu,
> > > >
> > > > On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> > > > >
> > > > > Attached is a patch that adds the ability to dump table data in multiple chunks.
> > > > >
> > > > > Looking for feedback at this point:
> > > > > 1) what have I missed
> > > > > 2) should I implement something to avoid single-page chunks
> > > > >
> > > > > The flag --huge-table-chunk-pages which tells the directory format
> > > > > dump to dump tables where the main fork has more pages than this in
> > > > > multiple chunks of given number of pages,
> > > > >
> > > > > The main use case is speeding up parallel dumps in case of one or a
> > > > > small number of HUGE tables so parts of these can be dumped in
> > > > > parallel.
> > > >
> > > > Have you measured speed up? Can you please share the numbers?
> > > >
> > > > --
> > > > Best Wishes,
> > > > Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-11-13 20:32:31 Re: Implement waiting for wal lsn replay: reloaded
Previous Message Hannu Krosing 2025-11-13 20:24:33 Re: Patch: dumping tables data in multiple chunks in pg_dump