| From: | surya poondla <suryapoondla4(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | [DESIGN] Soft DROP TABLE, recoverable drops for PostgreSQL |
| Date: | 2026-06-03 16:52:57 |
| Message-ID: | CAOVWO5o3TQHdEb_+agn0UFHmZDN19q5cxExX3Dc_5W+b0Bx_mQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Hackers,
I would like to discuss adding a soft / recoverable 'DROP TABLE' to
PostgreSQL.
The DROP moves the table into a reserved schema, where it remains
restorable until explicitly purged.
Motivation:
'DROP TABLE' is one of the few SQL statements where the consequences are
wildly out of proportion to the time it takes to type the command.
A few real-world scenarios I have seen were:
- An automated partition-rotation job ran against the wrong cluster after a
config change and
dropped a table holding a lot of data (in order of 10s of GB) of two days
of telemetry before the site reliability engineer stopped it.
- A nightly time-to-live cleanup script applied a stale rule and removed a
reporting table with six months of aggregated data (hundreds of GB), not
reconstructable from upstream sources.
The current recovery path for an accidentally dropped table is
point-in-time recovery or restoring from a logical backup, both of which
are disruptive relative to the size of the data, when all the operator
wanted was the table back five minutes later.
Several other systems offer a lighter-weight first line of defense:
- Oracle Database has Recycle Bin and 'FLASHBACK TABLE... TO BEFORE DROP'
since 10g. The dropped tables are stored in a special schema and remain
queryable until purged.
- IBM Db2 supports recovery of dropped tables on tablespaces configured
with 'DROPPED TABLE RECOVERY', replayed via rollforward.
- Snowflake provides 'UNDROP TABLE' within its Time Travel retention window
to restore the table back.
The recurring design across all of these: an accidental DROP should be
recoverable for some short period without invoking heavy disaster recovery.
The question I would like to put here is whether a more complete
implementation along the lines below would be welcome as a contrib
extension or a core feature.
Proposal:
The fundamental mechanism is inspired from pgtrashcan [1] by Peter
Eisentraut where a 'ProcessUtility_hook' intercepts 'DROP TABLE' and
rewrites it as 'ALTER TABLE ... SET SCHEMA "Trash"', so the relation
survives in a reserved schema rather than being removed.
What I have been exploring on top of that, to make the feature usable for
real recovery scenarios:
- A set of SQL APIs to list, restore, rename, move, and purge tables (and
their dependent indexes and sequences) from the reserved schema.
- A catalog table inside the reserved schema to capture the original name
of the object, original schema, the OID, drop time, dropped_by user, object
type, and a JSONB metadata blob describing dependent objects like foreign
keys that could not safely be moved.
- CASCADE. 'DROP TABLE t CASCADE' captures dependent foreign keys, views,
materialized views, rules, and RLS policies as JSONB metadata (including
runnable DDL strings built around 'pg_get_constraintdef' and
'pg_get_viewdef'), then permanently drops them. Indexes and owned sequences
ride along with the table, since 'ALTER TABLE SET SCHEMA' carries them.
- Unique names inside the reserved schema. Two drops of the same table name
need to coexist (an operator drops a table, recreates it under the same
name, and drops it again, both versions must remain recoverable
independently). The feature renames every dropped table to a deterministic,
OID-derived name (e.g. 'trash_$16384') inside the reserved schema.
The OID guarantees uniqueness across the cluster's lifetime, the rename is
short enough to never hit NAMEDATALEN truncation, and the original
schema/name are preserved separately in the catalog so the user-visible
naming is unaffected on restore.
- Because each drop produces a unique OID-derived name in the reserved
schema, multiple drops of the same table name sit in the schema as
independent rows in the catalog (each with its own OID, drop time, and
metadata).
The SQL APIs will support two access patterns:
1. restore-by-name which returns the most recently dropped version,
2. restore-by-OID which restores a specific table.
Users can also list all versions and rename older ones into a
quarantine schema before
restoring the most recent useful when the operator wants to compare or
merge data across versions.
- Triggers. Triggers can stay attached to the table across the schema move
because 'pg_trigger.tgrelid' is OID-based. The trade-off is that trigger
function bodies reference other tables by name, so the first firing after
restore can fail if those references were broken while the owner sat in
Trash. Whether this is the right default is an open question.
- A GUC to disable the trash-can per session or system-wide. When set,
'DROP TABLE' bypasses the reserved schema and the table is permanently
dropped, matching standard PostgreSQL semantics.
- Protection on the reserved schema and its catalog. Direct DML on the
catalog table is blocked, the schema cannot be dropped while the extension
is installed, and 'ALTER'/'RENAME' of objects already in the schema is
blocked.
- Misc objects. Extend the above capabilities to support tables like
Partitioned tables.
I am working on a prototype and will share it soon.
[1] https://github.com/petere/pgtrashcan
Regards,
Surya Poondla
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mihail Nikalayeu | 2026-06-03 17:27:00 | Re: Adding REPACK [concurrently] |
| Previous Message | Mihail Nikalayeu | 2026-06-03 16:37:25 | Re: Race conditions in logical decoding |