Fix races conditions in DropRole() and GrantRole()

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Fix races conditions in DropRole() and GrantRole()
Date: 2026-07-04 07:47:08
Message-ID: aki6fMNLUx6+BR8K@bdtpg
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

While working on [1], I observed that DropRole() and GrantRole() have the same
"use stale data after the lock is acquired" issues.

Indeed, DropRole() and GrantRole() resolve the role name to an OID before acquiring
LockSharedObject() on the role. A concurrent session that commits a DROP ROLE
between the read and the lock acquisition leaves the first session acting on a
stale OID.

Examples:

1/ DROP ROLE + concurrent DROP ROLE

CREATE ROLE testrole;
gdb breakpoint at user.c:1198 (before LockSharedObject) on session 1

session 1: DROP ROLE testrole;
session 1 is paused by the breakpoint
session 2: DROP ROLE testrole;
continue session 1 produces:
ERROR: could not find tuple for role 24662

2/ GRANT ROLE + concurrent DROP ROLE

CREATE ROLE testrole;
CREATE ROLE testmember;
gdb breakpoint at user.c:1716 (before LockSharedObject) on session 1

session 1: GRANT testrole TO testmember;
session is paused by the breakpoint
session 2: DROP ROLE testrole;
continue session 1: GRANT ROLE succeeds

It produces an orphaned pg_auth_members entry:

postgres=# SELECT m.member::regrole, m.roleid, r.rolname
FROM pg_auth_members m
LEFT JOIN pg_roles r ON m.roleid = r.oid
WHERE r.oid IS NULL;
member | roleid | rolname
------------+--------+---------
testmember | 16386 |

The patch attached fixes the races by using the same approach as
RangeVarGetRelidExtended(): It encapsulates name resolution, permission checking
(via a caller-supplied callback), and lock acquisition inside a retry loop driven
by SharedInvalidMessageCounter. If invalidation messages arrive between name
resolution and locking, indicating concurrent DDL, the function retries.

The lock is kept across retries and only released if the name resolves to a
different OID on the next iteration.

Two callbacks are provided:
- RoleNameCallbackForDropRole(): checks current/session user, superuser attribute,
and ADMIN OPTION privilege before locking. This is similar to what DropRole() is
currently doing before LockSharedObject().
- RoleNameCallbackForGrantRole(): calls check_role_membership_authorization() to
verify the current user can grant/revoke membership. This is similar to what GrantRole()
is currently doing before calling AddRoleMems()/DelRoleMems().

DropRole() and GrantRole() now call RoleNameGetOid() with appropriate lock
levels.

Remark:

AlterRole() does not need the fix because it calls CatalogTupleUpdate() on the
pg_authid tuple before AddRoleMems(), which blocks a concurrent DROP ROLE.

[1]: https://postgr.es/m/akZUpiDa1UfmzYxL%40bdtpg

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-Add-RoleNameGetOid-with-invalidation-based-retry-.patch text/x-diff 12.8 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2026-07-04 08:00:08 Re: Re-read subscription state after lock in AlterSubscription
Previous Message Tatsuo Ishii 2026-07-04 07:28:41 Re: Row pattern recognition