Re: [PATCH v1] Allow excluded tables to remain in the pg_restore data-loading path for directory-format dumps

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Kirk Wolak <wolakk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>, "lepihov(at)gmail(dot)com" <lepihov(at)gmail(dot)com>
Subject: Re: [PATCH v1] Allow excluded tables to remain in the pg_restore data-loading path for directory-format dumps
Date: 2026-07-01 17:55:01
Message-ID: 502B3804-007B-4B3D-B6F2-F9632EBECF7C@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 23 Jun 2026, at 03:17, Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
>
> Comments welcome. I am happy to adjust naming, documentation, or scope based on feedback.

Hi Kirk,

+1 on the problem. Excluding a few huge tables to keep the dump fast,
but still wanting a loadable slot so a recent slice can be dropped in
and restored in the normal order (data before indexes/FKs, no manual
reindex) is a real, recurring need. I'd be glad to see it solved.

For the archive: the broader "filter table data on dump" idea has come
up many times, and it may help to frame this patch against that history:

- 2008, Simon Riggs: pg_dump -w for sampling [0]. Redirected -
"why not COPY (SELECT ...) TO STDOUT".
- 2018, Carter Thaxton: --include-table-data-where, then --where [1].
Euler Taveira's concerns: a filtered subset "won't restore" (FKs),
and whether pg_dump is the right place for an ETL-ish parameter;
plus table:clause quoting/colon parsing. Returned with feedback.
- 2020, Surafel Temesgen: --where revival [2]. Daniel Gustafsson
flagged arbitrary user SQL as a search_path / CVE-2018-1058-class
hazard; no tests. Returned with feedback.
- 2022, Nikita Starovoitov: partial data dumps [3]. Stalled on
referential integrity - by the author's own account he could not
solve it within the dump.
- 2025, your own --filter-data PoC [4]. Still open; Tom suggested
partitioning, Greg suggested dumping a view (-t foo=view:fooslice).

Two objections recur across those: "just use COPY (SELECT ...) TO",
and "a raw row filter produces a dump that won't restore (FKs)". The
placeholder approach sits entirely clear of both. There is no user
WHERE clause, so no ETL surface, no search_path hazard, and no
consistency promise: as Carter noted in 2018, --where=table:false
already behaves exactly like --exclude-table-data, and the placeholder
is precisely that corner - identical restore-risk, it just keeps a
loadable empty TABLE DATA entry instead of dropping it. And the reason
this belongs in pg_dump rather than a hand-rolled \copy is the point
David G. Johnston made in 2018: it lets you lean on pg_dump's
dependency resolution - which is exactly what the placeholder buys, the
(externally produced) data lands on the normal restore path, ahead of
indexes and FKs, no post-restore reindex. So I'd pitch it as "reserve a
loadable slot", not "lightweight --where".

On the interface, a few options to discuss:

1. As posted: --create-table-data-placeholders as a global modifier
to --exclude-table-data. Works, but "create" already means CREATE
DATABASE in pg_dump, it's all-or-nothing across excluded tables,
and it couples two options.
2. A positive pattern option, e.g. --empty-table-data=PATTERN (plus
--empty-table-data-and-children), mirroring the --exclude-table-data
family. It folds the two current flags into one, reads as "include
the table, empty", and gives per-table control. Slight preference
here.
3. A matching action for the table_data object type in the --filter
file, for people (like you) who keep the big-table list in a file.
Heavier - it's a grammar change to a shared file format - so I'd
see it as an addition to 2, not a replacement.

WDYT?

Thank you!

Best regards, Andrey Borodin.

[0] https://www.postgresql.org/message-id/flat/1216999182(dot)3894(dot)930(dot)camel(at)ebony(dot)2ndQuadrant
[1] https://www.postgresql.org/message-id/flat/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=PXkTETCxXg4kcUTktA(at)mail(dot)gmail(dot)com
[2] https://www.postgresql.org/message-id/flat/CALAY4q8o00Sg5nemi2AuqNuLvmGLaR6br+YrvM41ZuSPjM2Qvg(at)mail(dot)gmail(dot)com
[3] https://www.postgresql.org/message-id/flat/CAC5EiNMthnPAKugDj0D8-kfn21hPQnOuvDAMOHa4TDmzGYon4w(at)mail(dot)gmail(dot)com
[4] https://www.postgresql.org/message-id/flat/CACLU5mS07WGPpq6=m8aC5tUMusNpL8FhBPnYem86iSawEgcavw(at)mail(dot)gmail(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ben Mejia 2026-07-01 18:01:33 Re: hashjoins vs. bitmap filters
Previous Message Tristan Partin 2026-07-01 17:51:25 Add malloc attribute to memory allocation functions