Re: Patch: dumping tables data in multiple chunks in pg_dump

From: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
To: Hannu Krosing <hannuk(at)google(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Subject: Re: Patch: dumping tables data in multiple chunks in pg_dump
Date: 2026-02-03 21:10:12
Message-ID: CAN4CZFMHr0gFx-DpKBy4SkVzuJC5f3S=pnkrMDC54OswBSuJNQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

I did some testing with this patch, and I think there are some issues
during restoration:

1. Isn't there a possible race / scheduling mistake during restore
because of missing dependencies? The code now prints out "TABLE DATA
(pages %u:%u)", while the restore code checks for the explicit "TABLE
DATA" string for dependency tracking (pg_backup_archiver.c:2013 and a
few other places). This causes POST DATA to have no dependency on the
table data, and can be scheduled before we load all table data.

I was able to verify the scheduling issue with an index: the INDEX
part is scheduled too early, before all TABLE DATA completes, but then
locking prevents it from progressing, so everything completed fine in
the end. Even if that's guaranteed, which I'm not 100% sure of, it's
still based on luck and not proper logic, and takes up a slot (or
multiple), reducing parallelism.

2. Fixing the TABLE DATA strcmp checks solves the scheduling issue,
but it's not that simple, because then it causes truncation issues
during restore, which needs additional changes in the restore code. I
did a quick fix for that by adding an additional condition to the
created flag, and with that it seems to restore everything properly,
and with proper ordering, only starting index/constraint/etc after all
table data is completed. However this was definitely just a quick test
fix, this needs a proper better solution.

Other issues I see are more minor, but numerous:

3. The patch still has lots of debug output (pg_log_WARNING("CHUNKING
...")); Is this intended? Shouldn't these be behind some verbose
check, and maybe use info instead of warning?

4. The is_segment macro should have () around the use of tdiptr

5. There's still a 32000 magic constant, shouldn't that have some
descriptive name / explanatory comment?

6. formatting issues at multiple places, mostly missing spaces after
if/while/for statements

7. inconsistent error messages (not in range vs must be in range)

8. There's a remaining TODO that seems stale, current_chunk_start is
already uint64

9. typo: "the computed from pog_relation_size" -> "then computed from
pg_relation_size"

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2026-02-03 21:25:54 Re: pg_dumpall --roles-only interact with other options
Previous Message Nathan Bossart 2026-02-03 21:04:41 Re: Add backendType to PGPROC, replacing isRegularBackend