| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
|---|---|
| To: | cca5507 <cca5507(at)qq(dot)com> |
| Cc: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Handle concurrent drop when doing whole database vacuum |
| Date: | 2026-06-23 21:06:41 |
| Message-ID: | CALj2ACUadybE6CQ3=noAX7xr-gObscUGdU6ak87O_gznHGTp9Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Sun, Jun 14, 2026 at 12:13 AM cca5507 <cca5507(at)qq(dot)com> wrote:
>
> Hi hackers,
>
> When doing a whole database vacuum, we scan pg_class to construct
> a list of vacuumable tables. For each vacuumable table, we call
> vacuum_is_permitted_for_relation() to check permissions. If a
> concurrent drop happens, the pg_class_aclcheck() might report an
> error because of failing to search the syscache:
>
> ERROR: relation with OID ****** does not exist
>
> To fix it, we can use pg_class_aclcheck_ext() to detect the concurrent
> drop and report a warning instead.
>
> Note that a concurrent drop after constructing the list of vacuumable
> tables is handled by vacuum_open_relation().
>
> Thoughts?
Nice catch!
IMHO, the vacuum command failing in this situation may not be a huge
problem in practice, because the user sees the error and can re-issue
the command.
get_all_vacuum_rels is called only for database-wide vacuum (neither
autovacuum nor vacuum <<tables-list>> calls it - in those cases, the
permission check happens after the table lock is taken in
vacuum_open_relation).
Let's think about what happens if we don't fix this. For example, a
database-wide vacuum with 1000 tables has scanned 990 of them in
pg_class and the 991st table gets concurrently dropped - the vacuum
command fails and the user needs to notice that and re-issue the
command, at which point it rescans pg_class. What exactly is the
problem we're solving? Is it the vacuum command failing and the user
missing it or forgetting to re-issue? The pg_class scan being
expensive? Or consistency with how autovacuum and vacuum
<<tables-list>> handle this?
ReindexMultipleTables has a similar scan on pg_class with a permission
check, but it is not affected because the permission check is gated
behind shared relations (so droppable user tables never reach it).
Some comments on the v3 patches:
1/
+ *
+ * Note: use pg_class_aclcheck_ext() to detect a concurrent drop.
This comment seems redundant - the function call, the missing_ok
check, and the warning message already make it clear.
2/
+ * Note that the relation might not be locked, so it can be dropped
concurrently.
+ * This can happen when doing a whole database vacuum or analyze in
+ * get_all_vacuum_rels(). We issue a WARNING log message and return false in
+ * this case.
Instead of saying "relation might not be locked", can we be more
explicit about the exact cases? For example: For database-wide
vacuums, this function is called without holding a relation lock, so
the table can be dropped concurrently. In that case, issue a WARNING
and return false.
I couldn't think of a good way to reproduce this other than creating a
large number of tables to make the pg_class scan costlier. For
predictable testing I might agree to adding an injection point.
However, I have some comments:
3/
+# Make sure that current user is not the owner of current database.
+step setrole1
+{
+ SET ROLE regress_vacuum;
+}
Is this necessary?
4/
+step analyze1
+{
+ ANALYZE;
+}
I think just testing VACUUM or VACUUM ANALYZE in a single test keeps
things smaller, since the underlying code is the same for both.
5/
+++ b/src/test/modules/injection_points/specs/vacuum_concurrent_drop.spec
Why do you need isolation tests with an injection point? Why not a TAP
test under src/test/modules/test_misc? A TAP test with just VACUUM
(not ANALYZE) would keep the test simpler with fewer lines of code.
6/
+ INJECTION_POINT("vacuum-constructing-vacuumable-rels", NULL);
Nit: How about a shorter name like vacuum-get-rels or vacuum-build-rels?
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ethan Mertz | 2026-06-23 21:09:15 | Re: [PATCH] Improving index selection for logical replication apply with replica identity full |
| Previous Message | Daniel Gustafsson | 2026-06-23 21:05:09 | Re: Little checksum worker cleanups |