Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: alex work <alexwork033(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Date: 2024-03-21 16:07:36
Message-ID: 341186.1711037256@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

[ redirecting to -hackers ]

alex work <alexwork033(at)gmail(dot)com> writes:
> We encounter slow `GRANT ROLES` only on PostgreSQL 16 instances up to 42 seconds
> in production, the client process at PostgresSQL would use 100% of the CPU.
> Which is a surprise compared to other instances running older PostgreSQL
> releases. On production we have a *LOT* of ROLEs, which unfortunately a case
> that we did not test before switching the new servers into production mode.

I poked into this a bit. It seems the problem is that as of v16, we
try to search for the "best" role membership path from the current
user to the target role, and that's done in a very brute-force way,
as a side effect of computing the set of *all* role memberships the
current role has. In the given case, we could have skipped all that
if we simply tested whether the current role is directly a member
of the target: it is, so there can't be any shorter path. But in
any case roles_is_member_of has horrid performance when the current
role is a member of a lot of roles.

It looks like part of the blame might be ascribable to catcache.c,
as if you look at the problem microscopically you find that
roles_is_member_of is causing catcache to make a ton of AUTHMEMMEMROLE
catcache lists, and SearchSysCacheList is just iterating linearly
through the cache's list-of-lists, so that search is where the O(N^2)
time is actually getting taken. Up to now that code has assumed that
any one catcache would not have very many catcache lists. Maybe it's
time to make that smarter; but since we've gotten away with this
implementation for decades, I can't help feeling that the real issue
is with roles_is_member_of's usage pattern.

For self-containedness, attached is a directly usable shell script
to reproduce the problem. The complaint is that the last GRANT
takes multiple seconds (about 5s on my machine), rather than
milliseconds.

regards, tom lane

Attachment Content-Type Size
grant_with_many_roles.sh text/x-shellscript 1.2 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2024-03-21 17:00:39 Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs
Previous Message Robert Treat 2024-03-21 15:17:21 Re: Question about PostgreSQL upgrade from version 12 to version 15

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-03-21 16:08:11 Re: remaining sql/json patches
Previous Message Robert Treat 2024-03-21 15:37:39 Re: Possibility to disable `ALTER SYSTEM`