Re: Handle concurrent drop when doing whole database vacuum

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: cca5507 <cca5507(at)qq(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle concurrent drop when doing whole database vacuum
Date: 2026-06-23 21:06:41
Message-ID: CALj2ACUadybE6CQ3=noAX7xr-gObscUGdU6ak87O_gznHGTp9Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sun, Jun 14, 2026 at 12:13 AM cca5507 <cca5507(at)qq(dot)com> wrote:
>
> Hi hackers,
>
> When doing a whole database vacuum, we scan pg_class to construct
> a list of vacuumable tables. For each vacuumable table, we call
> vacuum_is_permitted_for_relation() to check permissions. If a
> concurrent drop happens, the pg_class_aclcheck() might report an
> error because of failing to search the syscache:
>
> ERROR: relation with OID ****** does not exist
>
> To fix it, we can use pg_class_aclcheck_ext() to detect the concurrent
> drop and report a warning instead.
>
> Note that a concurrent drop after constructing the list of vacuumable
> tables is handled by vacuum_open_relation().
>
> Thoughts?

Nice catch!

IMHO, the vacuum command failing in this situation may not be a huge
problem in practice, because the user sees the error and can re-issue
the command.

get_all_vacuum_rels is called only for database-wide vacuum (neither
autovacuum nor vacuum <<tables-list>> calls it - in those cases, the
permission check happens after the table lock is taken in
vacuum_open_relation).

Let's think about what happens if we don't fix this. For example, a
database-wide vacuum with 1000 tables has scanned 990 of them in
pg_class and the 991st table gets concurrently dropped - the vacuum
command fails and the user needs to notice that and re-issue the
command, at which point it rescans pg_class. What exactly is the
problem we're solving? Is it the vacuum command failing and the user
missing it or forgetting to re-issue? The pg_class scan being
expensive? Or consistency with how autovacuum and vacuum
<<tables-list>> handle this?

ReindexMultipleTables has a similar scan on pg_class with a permission
check, but it is not affected because the permission check is gated
behind shared relations (so droppable user tables never reach it).

Some comments on the v3 patches:

1/
+ *
+ * Note: use pg_class_aclcheck_ext() to detect a concurrent drop.

This comment seems redundant - the function call, the missing_ok
check, and the warning message already make it clear.

2/
+ * Note that the relation might not be locked, so it can be dropped
concurrently.
+ * This can happen when doing a whole database vacuum or analyze in
+ * get_all_vacuum_rels(). We issue a WARNING log message and return false in
+ * this case.

Instead of saying "relation might not be locked", can we be more
explicit about the exact cases? For example: For database-wide
vacuums, this function is called without holding a relation lock, so
the table can be dropped concurrently. In that case, issue a WARNING
and return false.

I couldn't think of a good way to reproduce this other than creating a
large number of tables to make the pg_class scan costlier. For
predictable testing I might agree to adding an injection point.
However, I have some comments:

3/
+# Make sure that current user is not the owner of current database.
+step setrole1
+{
+ SET ROLE regress_vacuum;
+}

Is this necessary?

4/
+step analyze1
+{
+ ANALYZE;
+}

I think just testing VACUUM or VACUUM ANALYZE in a single test keeps
things smaller, since the underlying code is the same for both.

5/
+++ b/src/test/modules/injection_points/specs/vacuum_concurrent_drop.spec

Why do you need isolation tests with an injection point? Why not a TAP
test under src/test/modules/test_misc? A TAP test with just VACUUM
(not ANALYZE) would keep the test simpler with fewer lines of code.

6/
+ INJECTION_POINT("vacuum-constructing-vacuumable-rels", NULL);

Nit: How about a shorter name like vacuum-get-rels or vacuum-build-rels?

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ethan Mertz 2026-06-23 21:09:15 Re: [PATCH] Improving index selection for logical replication apply with replica identity full
Previous Message Daniel Gustafsson 2026-06-23 21:05:09 Re: Little checksum worker cleanups