Re: Handle concurrent drop when doing whole database vacuum

From: surya poondla <suryapoondla4(at)gmail(dot)com>
To: cca5507 <cca5507(at)qq(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle concurrent drop when doing whole database vacuum
Date: 2026-06-17 18:40:38
Message-ID: CAOVWO5qfWohD5DyUNq_23eRwFBZsQ1NbuYWPushW2NpkH4M8mA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi ChangAo,

Thank you for reporting and fixing the issue.
The race condition looks real, I confirmed it against the current HEAD.
One thing worth adding to the diagnosis: list construction in
get_all_vacuum_rels() runs in the outer transaction, before vacuum() enters
its PG_TRY block,
so the error from pg_class_aclmask_ext() will abort the entire VACUUM
operation. This is clearly a bug

Updating vacuum_is_permitted_for_relation() to call pg_class_aclcheck_ext()
with is_missing rather than pg_class_aclcheck() looks right to me.
For pg_class_aclcheck(), I checked the other three callers,
expand_vacuum_rel(), vacuum_rel(), and analyze_rel() and each one already
holds a relation lock by the time it reaches this function,
and the error will never fire in those paths.

On Kyotaro's alternative of taking ConditionalLockRelationOid() during list
construction, in the style of get_tables_to_repack().
I agree holding many locks during list build is a real cost on busy
databases, and since each table is processed in its
own transaction, vacuum_open_relation() still has to handle the post-list
drop case regardless.

A couple of points on the patch itself:
1. The bug is racy but the injection_points framework
(src/test/modules/injection_points) can make it deterministic.
We can put an INJECTION_POINT() inside the heap_getnext() loop in
get_all_vacuum_rels() and adding an isolation spec that parks
VACUUM there, runs DROP TABLE in another session, then resumes VACUUM and
asserts it completes with a WARNING.

2. Minor comment vacuum_open_relation() already emits an identically-worded
"relation no longer exists" message with errcode(ERRCODE_UNDEFINED_TABLE).
Worth adding the same errcode to the two new ereports so the SQLSTATE stays
consistent for the same logical event.

This looks like a long-standing bug and I feel it should be backported.

Overall, the patch is heading in the right direction.

Regards,
Surya Poondla

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Bharath Rupireddy 2026-06-17 18:37:12 Re: mxid_score can become Infinity in pg_stat_autovacuum_scores