| From: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
|---|---|
| To: | Kirk Wolak <wolakk(at)gmail(dot)com> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>, "lepihov(at)gmail(dot)com" <lepihov(at)gmail(dot)com> |
| Subject: | Re: [PATCH v1] Allow excluded tables to remain in the pg_restore data-loading path for directory-format dumps |
| Date: | 2026-07-01 17:55:01 |
| Message-ID: | 502B3804-007B-4B3D-B6F2-F9632EBECF7C@yandex-team.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On 23 Jun 2026, at 03:17, Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
>
> Comments welcome. I am happy to adjust naming, documentation, or scope based on feedback.
Hi Kirk,
+1 on the problem. Excluding a few huge tables to keep the dump fast,
but still wanting a loadable slot so a recent slice can be dropped in
and restored in the normal order (data before indexes/FKs, no manual
reindex) is a real, recurring need. I'd be glad to see it solved.
For the archive: the broader "filter table data on dump" idea has come
up many times, and it may help to frame this patch against that history:
- 2008, Simon Riggs: pg_dump -w for sampling [0]. Redirected -
"why not COPY (SELECT ...) TO STDOUT".
- 2018, Carter Thaxton: --include-table-data-where, then --where [1].
Euler Taveira's concerns: a filtered subset "won't restore" (FKs),
and whether pg_dump is the right place for an ETL-ish parameter;
plus table:clause quoting/colon parsing. Returned with feedback.
- 2020, Surafel Temesgen: --where revival [2]. Daniel Gustafsson
flagged arbitrary user SQL as a search_path / CVE-2018-1058-class
hazard; no tests. Returned with feedback.
- 2022, Nikita Starovoitov: partial data dumps [3]. Stalled on
referential integrity - by the author's own account he could not
solve it within the dump.
- 2025, your own --filter-data PoC [4]. Still open; Tom suggested
partitioning, Greg suggested dumping a view (-t foo=view:fooslice).
Two objections recur across those: "just use COPY (SELECT ...) TO",
and "a raw row filter produces a dump that won't restore (FKs)". The
placeholder approach sits entirely clear of both. There is no user
WHERE clause, so no ETL surface, no search_path hazard, and no
consistency promise: as Carter noted in 2018, --where=table:false
already behaves exactly like --exclude-table-data, and the placeholder
is precisely that corner - identical restore-risk, it just keeps a
loadable empty TABLE DATA entry instead of dropping it. And the reason
this belongs in pg_dump rather than a hand-rolled \copy is the point
David G. Johnston made in 2018: it lets you lean on pg_dump's
dependency resolution - which is exactly what the placeholder buys, the
(externally produced) data lands on the normal restore path, ahead of
indexes and FKs, no post-restore reindex. So I'd pitch it as "reserve a
loadable slot", not "lightweight --where".
On the interface, a few options to discuss:
1. As posted: --create-table-data-placeholders as a global modifier
to --exclude-table-data. Works, but "create" already means CREATE
DATABASE in pg_dump, it's all-or-nothing across excluded tables,
and it couples two options.
2. A positive pattern option, e.g. --empty-table-data=PATTERN (plus
--empty-table-data-and-children), mirroring the --exclude-table-data
family. It folds the two current flags into one, reads as "include
the table, empty", and gives per-table control. Slight preference
here.
3. A matching action for the table_data object type in the --filter
file, for people (like you) who keep the big-table list in a file.
Heavier - it's a grammar change to a shared file format - so I'd
see it as an addition to 2, not a replacement.
WDYT?
Thank you!
Best regards, Andrey Borodin.
[0] https://www.postgresql.org/message-id/flat/1216999182(dot)3894(dot)930(dot)camel(at)ebony(dot)2ndQuadrant
[1] https://www.postgresql.org/message-id/flat/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=PXkTETCxXg4kcUTktA(at)mail(dot)gmail(dot)com
[2] https://www.postgresql.org/message-id/flat/CALAY4q8o00Sg5nemi2AuqNuLvmGLaR6br+YrvM41ZuSPjM2Qvg(at)mail(dot)gmail(dot)com
[3] https://www.postgresql.org/message-id/flat/CAC5EiNMthnPAKugDj0D8-kfn21hPQnOuvDAMOHa4TDmzGYon4w(at)mail(dot)gmail(dot)com
[4] https://www.postgresql.org/message-id/flat/CACLU5mS07WGPpq6=m8aC5tUMusNpL8FhBPnYem86iSawEgcavw(at)mail(dot)gmail(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ben Mejia | 2026-07-01 18:01:33 | Re: hashjoins vs. bitmap filters |
| Previous Message | Tristan Partin | 2026-07-01 17:51:25 | Add malloc attribute to memory allocation functions |