| From: | Hannu Krosing <hannuk(at)google(dot)com> |
|---|---|
| To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com> |
| Subject: | Re: Patch: dumping tables data in multiple chunks in pg_dump |
| Date: | 2025-11-24 21:02:15 |
| Message-ID: | CAMT0RQQPMj3=EZ-4z6qRs_TmBHoyv2VHAdMrfDuwa5ZUY6XtHQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
The expectation was that as chunking is useful mainly in case of
really huge tables the analyze should have been run "recently enough".
Maybe we should use pg_relation_size() in case we have already
determined that the table is large enough to warrant chunking? Maybe
at least 1/2 of the requested chunk size?
My reasoning was to not put too much extra load on pg_dump in case
chunking is not required. But of course we can use the presence of a
chunking request to decide to run pg_relation_size(), assuming the
overhead won't be too large in this case.
On Mon, Nov 17, 2025 at 5:15 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Nov 11, 2025 at 9:00 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> >
> > Attached is a patch that adds the ability to dump table data in multiple chunks.
> >
> > Looking for feedback at this point:
> > 1) what have I missed
> > 2) should I implement something to avoid single-page chunks
> >
> > The flag --huge-table-chunk-pages which tells the directory format
> > dump to dump tables where the main fork has more pages than this in
> > multiple chunks of given number of pages,
> >
> > The main use case is speeding up parallel dumps in case of one or a
> > small number of HUGE tables so parts of these can be dumped in
> > parallel.
> >
>
> +1 for the idea, I haven't done the detailed review but I was just
> going through the patch, I noticed that we use pg_class->relpages to
> identify whether to chunk the table or not, which should be fine but
> don't you think if we use direct size calculation function like
> pg_relation_size() we might get better idea and not dependent upon
> whether the stats are updated or not? This will make chunking
> behavior more deterministic.
>
> --
> Regards,
> Dilip Kumar
> Google
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2025-11-24 21:02:18 | Re: pgsql: Teach DSM registry to ERROR if attaching to an uninitialized ent |
| Previous Message | Andres Freund | 2025-11-24 20:57:59 | Re: Buffer locking is special (hints, checksums, AIO writes) |