BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: exclusion(at)gmail(dot)com
Subject: BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used
Date: 2023-07-04 11:00:01
Message-ID: 18014-28c81cb79d44295d@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18014
Logged by: Alexander Lakhin
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 16beta2
Operating system: Ubuntu 22.04
Description:

Yesterday's test failure on prion:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2023-07-03%2010%3A13%3A03
made me wonder, what's going on there and whether it's yet another issue
with invalidating relcache (bug #17994).
(
SELECT schema_to_xmlschema('testxmlschema', false, true, '');
ERROR: relation with OID 29598 does not exist
CONTEXT: SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE
relnamespace = 29597 AND relkind IN ('r','m','v') AND
pg_catalog.has_table_privilege (oid, 'SELECT') ORDER BY relname;"

Other failures of that kind:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prion&dt=2023-06-20%2001%3A56%3A04&stg=check
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prion&dt=2023-04-15%2017%3A17%3A09&stg=check
)

I managed to construct a simple reproducer for the error:
for ((n=1;n<=30;n++)); do
echo "ITERATION $n"

numclients=30
for ((c=1;c<=$numclients;c++)); do
cat << EOF | psql >psql_$c.log &
CREATE SCHEMA testxmlschema_$c;

SELECT format('CREATE TABLE testxmlschema_$c.test_%s (a int);', g) FROM
generate_series(1, 30) g
\\gexec

SET parallel_setup_cost = 1;
SET min_parallel_table_scan_size = '1kB';
SELECT schema_to_xmlschema('testxmlschema_$c', true, false, '');

SELECT format('DROP TABLE testxmlschema_$c.test_%s', g) FROM
generate_series(1, 30) g
\\gexec

DROP SCHEMA testxmlschema_$c;
EOF
done
wait
grep 'ERROR:' server.log && break;
done

With a server compiled as follows:
CPPFLAGS="-O0 -DCATCACHE_FORCE_RELEASE" ./configure -q --enable-debug
--enable-cassert --enable-tap-tests --with-libxml && make ...
(More precisely, "#ifndef CATCACHE_FORCE_RELEASE" in ReleaseCatCache()
does matter here.)

I get errors as in the test in question:
...
ITERATION 9
ITERATION 10
ERROR: relation with OID 59777 does not exist
CONTEXT: parallel worker
SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE relnamespace =
57162 AND relkind IN ('r','m','v') AND pg_catalog.has_table_privilege (oid,
'SELECT') ORDER BY relname;"
2023-07-04 12:48:14.205 MSK [3111661] ERROR: relation with OID 59777 does
not exist
2023-07-04 12:48:14.206 MSK [3111598] ERROR: relation with OID 59777 does
not exist

With a debug logging added in src/backend/utils/adt/acl.c, I see that
SearchSysCacheExists1(RELOID, ObjectIdGetDatum(tableoid) returns true in
has_table_privilege_id(), but later, in
pg_class_aclcheck()/pg_class_aclmask_ext(),
SearchSysCache1(RELOID, ObjectIdGetDatum(table_oid)) returns NULL.

This is reproduced on REL_10_STABLE .. master.
The first commit that demonstrates the issue is 61c2e1a95 (it improved
access to parallelism for SPI users, one of which is
schema_to_xmlschema_internal() (see also schema_get_xml_visible_tables())).

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2023-07-04 11:30:21 Re: Backend handling replication slot stuck using 100% cpu, unkillable
Previous Message Sandeep Thakkar 2023-07-04 10:54:47 Re: BUG #18012: Installer fails to run .bat files when they are registered to Notepad++