| From: | Shawn McCoy <shawn(dot)the(dot)mccoy(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Vacuumlo improvements |
| Date: | 2026-05-12 17:34:10 |
| Message-ID: | CALsgZNAM=AYK-9ZLR7Z0YLx6Lyx5aSrjjse3T+FmsJ=jTvfhDQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I'd like to discuss a behavior in the vacuumlo utility that can lead
to silent data loss when large object references are stored in columns
whose type is a domain over `oid` or `lo`. While fully stated in the
docs, we have observed users getting some surprises when they are
trying to do routine maintenance. I'll attach a very simple repro
that displays the behavior using
a fairly routine use of a domain. [1]
DESCRIPTION
---------------------
The vacuumlo documentation [2] states:
"Only types with these names are considered; in particular, domains
over them are not considered."
While the behavior is documented, in the field, the consequence is
severe: if a user creates a domain over `oid` (e.g., for semantic
clarity or to add constraints) and uses that domain-typed column to
store large object OIDs, vacuumlo will treat those LOs as orphaned and
delete them. The referenced data is silently destroyed.
This is particularly dangerous because:
- Domains over base types are a popular PostgreSQL practice
- There is no warning or diagnostic output from vacuumlo when it skips
domain-typed columns
- The --dry-run / -n flag will show these LOs as "would be removed"
but nothing indicates *why* they appear orphaned
- The data loss is irreversible
SUGGESTION
---------------------
Ideally, vacuumlo could be improved to:
- Resolve domain types back to their base types when scanning columns
(using pg_type.typbasetype), or
- At least emit a WARNING when it encounters columns with domains over
oid/lo that it is skipping, so the user is aware.
I don't currently have a patch attached, but wanted to shine light on
the issue given the silent data-loss risk. I know LO's are a sensitive
topic with discussions that wander towards deprecation, but they seem
to be here to stay and are very commonly used in the field.
At minimum, I can submit a documentation improvement to make the
data-loss risk more prominent. The current parenthetical note is easy
to miss.
[1] attached vacuumlo_repro.txt
[2] https://www.postgresql.org/docs/current/vacuumlo.html
Thanks,
Shawn
| Attachment | Content-Type | Size |
|---|---|---|
| vacuumlo_repro.txt | text/plain | 1.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | SATYANARAYANA NARLAPURAM | 2026-05-12 17:41:53 | Re: [Bug]Assertion failure in LATERAL GRAPH_TABLE with multi-label pattern |
| Previous Message | Alena Rybakina | 2026-05-12 17:19:27 | Re: pull-up subquery if JOIN-ON contains refs to upper-query |