Re: Patch: dumping tables data in multiple chunks in pg_dump

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Michael Banck <mbanck(at)gmx(dot)net>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
Date: 2026-03-28 15:33:59
Message-ID: CAMT0RQTe4Zr=rdcKMJj-=c7CH0PJh=ZPk=xOU98+M7p9-D+Yew@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The above

"Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items."

should read

"Or it can be almost 200 GB *for a single page* if the page has just
pointers to 1GB TOAST items."

On Sat, Mar 28, 2026 at 4:32 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
>
> The issue is that currently the value is given in "main table pages"
> and it would be somewhat deceptive, or at least confusing, to try to
> express this in any other unit.
>
> As I explained in the commit message:
>
> ---------8<-------------------8<-------------------8<----------------
> This --max-table-segment-pages number specifically applies to main table
> pages which does not guarantee anything about output size.
> The output could be empty if there are no live tuples in the page range.
> Or it can be almost 200 GB if the page has just pointers to 1GB TOAST items.
> ---------8<-------------------8<-------------------8<----------------
>
> And I can think of no cheap and reliable way to change that equation.
>
> I'll be very happy if you have any good ideas for either improving the
> flag name, or even propose a way to better estimate the resulting dump
> file size so we could give the chunk size in better units
>
> ---
> Hannu
>
>
>
>
>
> On Sat, Mar 28, 2026 at 12:26 PM Michael Banck <mbanck(at)gmx(dot)net> wrote:
> >
> > Hi,
> >
> > On Tue, Jan 13, 2026 at 03:27:25PM +1300, David Rowley wrote:
> > > Perhaps --max-table-segment-pages is a better name than
> > > --huge-table-chunk-pages as it's quite subjective what the minimum
> > > number of pages required to make a table "huge".
> >
> > I'm not sure that's better - without looking at the documentation,
> > people might confuse segment here with the 1GB split of tables into
> > segments. As pg_dump is a very common and basic user tool, I don't think
> > implementation details like pages/page sizes and blocks should be part
> > of its UX.
> >
> > Can't we just make it a storage size, like '10GB' and then rename it to
> > --table-parallel-threshold or something? I agree it's bikeshedding, but
> > I personally don't like either --max-table-segment-pages or
> > --huge-table-chunk-pages.
> >
> >
> > Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2026-03-28 15:58:33 Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part
Previous Message Hannu Krosing 2026-03-28 15:32:23 Re: Patch: dumping tables data in multiple chunks in pg_dump