Race conditions in has_table_privilege() and friends

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Race conditions in has_table_privilege() and friends
Date: 2023-04-15 19:47:49
Message-ID: 2051732.1681588069@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I noticed that prion recently failed [1] with

SELECT schema_to_xml_and_xmlschema('testxmlschema', true, true, 'foo');
...
+ERROR: relation with OID 30019 does not exist
+CONTEXT: SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE relnamespace = 29949 AND relkind IN ('r','m','v') AND pg_catalog.has_table_privilege (oid, 'SELECT') ORDER BY relname;"

What evidently happened here is that has_table_privilege got applied
to some relation (in a schema different from 'testxmlschema', which
should be stable here) that was in the middle of getting dropped.
This'd require the relnamespace condition to get applied after the
has_table_privilege condition, which is quite possible (although it seems
to require that auto-analyze update pg_class's statistics while this
test runs). Even then, has_table_privilege is supposed to survive the
situation, but it has a race condition:

if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(tableoid)))
PG_RETURN_NULL();

aclresult = pg_class_aclcheck(tableoid, roleid, mode);

The fact that SearchSysCacheExists1 succeeds doesn't guarantee that
when pg_class_aclcheck fetches the same row a moment later, it'll
still be there. The race-condition window is pretty narrow (and
indeed I don't think we've seen this buildfarm symptom before),
but it exists.

Now, it looks to me like this case is trivial to fix, using the
pg_class_aclcheck_ext function introduced in b12bd4869 (in v14).
We just need to drop the separate SearchSysCacheExists1 call and do

aclresult = pg_class_aclcheck_ext(tableoid, roleid, mode, &is_missing);

which should be a trifle faster as well as safer.

However, to cover the remaining risk spots in acl.c, it looks like
we'd need "_ext" versions of pg_attribute_aclcheck_all and
object_aclcheck, which were not introduced by b12bd4869.
object_aclcheck_ext in particular looks like it'd be a bit invasive.

What I'm inclined to do for now is clean up the table-related cases
and leave the code paths using object_aclcheck for another day.
We've always been much more concerned about DDL race conditions for
tables than other kinds of objects, so this approach seems to fit
with past decisions. I haven't written any code yet, but this looks
like it might amount to a couple hundred lines of fairly simple
changes.

I wonder if we should consider this a bug and back-patch to v14,
or maybe HEAD only; or is it okay to let it slide to v17?

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2023-04-15%2017%3A17%3A09

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-04-15 20:40:08 Re: Should vacuum process config file reload more often
Previous Message Tom Lane 2023-04-15 18:19:35 Re: Direct I/O