| From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Fix races conditions in DropRole() and GrantRole() |
| Date: | 2026-07-04 07:47:08 |
| Message-ID: | aki6fMNLUx6+BR8K@bdtpg |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
While working on [1], I observed that DropRole() and GrantRole() have the same
"use stale data after the lock is acquired" issues.
Indeed, DropRole() and GrantRole() resolve the role name to an OID before acquiring
LockSharedObject() on the role. A concurrent session that commits a DROP ROLE
between the read and the lock acquisition leaves the first session acting on a
stale OID.
Examples:
1/ DROP ROLE + concurrent DROP ROLE
CREATE ROLE testrole;
gdb breakpoint at user.c:1198 (before LockSharedObject) on session 1
session 1: DROP ROLE testrole;
session 1 is paused by the breakpoint
session 2: DROP ROLE testrole;
continue session 1 produces:
ERROR: could not find tuple for role 24662
2/ GRANT ROLE + concurrent DROP ROLE
CREATE ROLE testrole;
CREATE ROLE testmember;
gdb breakpoint at user.c:1716 (before LockSharedObject) on session 1
session 1: GRANT testrole TO testmember;
session is paused by the breakpoint
session 2: DROP ROLE testrole;
continue session 1: GRANT ROLE succeeds
It produces an orphaned pg_auth_members entry:
postgres=# SELECT m.member::regrole, m.roleid, r.rolname
FROM pg_auth_members m
LEFT JOIN pg_roles r ON m.roleid = r.oid
WHERE r.oid IS NULL;
member | roleid | rolname
------------+--------+---------
testmember | 16386 |
The patch attached fixes the races by using the same approach as
RangeVarGetRelidExtended(): It encapsulates name resolution, permission checking
(via a caller-supplied callback), and lock acquisition inside a retry loop driven
by SharedInvalidMessageCounter. If invalidation messages arrive between name
resolution and locking, indicating concurrent DDL, the function retries.
The lock is kept across retries and only released if the name resolves to a
different OID on the next iteration.
Two callbacks are provided:
- RoleNameCallbackForDropRole(): checks current/session user, superuser attribute,
and ADMIN OPTION privilege before locking. This is similar to what DropRole() is
currently doing before LockSharedObject().
- RoleNameCallbackForGrantRole(): calls check_role_membership_authorization() to
verify the current user can grant/revoke membership. This is similar to what GrantRole()
is currently doing before calling AddRoleMems()/DelRoleMems().
DropRole() and GrantRole() now call RoleNameGetOid() with appropriate lock
levels.
Remark:
AlterRole() does not need the fix because it calls CatalogTupleUpdate() on the
pg_authid tuple before AddRoleMems(), which blocks a concurrent DROP ROLE.
[1]: https://postgr.es/m/akZUpiDa1UfmzYxL%40bdtpg
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Add-RoleNameGetOid-with-invalidation-based-retry-.patch | text/x-diff | 12.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dilip Kumar | 2026-07-04 08:00:08 | Re: Re-read subscription state after lock in AlterSubscription |
| Previous Message | Tatsuo Ishii | 2026-07-04 07:28:41 | Re: Row pattern recognition |