| From: | PG Bug reporting form <noreply(at)postgresql(dot)org> | 
|---|---|
| To: | pgsql-bugs(at)lists(dot)postgresql(dot)org | 
| Cc: | exclusion(at)gmail(dot)com | 
| Subject: | BUG #18014: Releasing catcache entries makes schema_to_xmlschema() fail when parallel workers are used | 
| Date: | 2023-07-04 11:00:01 | 
| Message-ID: | 18014-28c81cb79d44295d@postgresql.org | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs | 
The following bug has been logged on the website:
Bug reference:      18014
Logged by:          Alexander Lakhin
Email address:      exclusion(at)gmail(dot)com
PostgreSQL version: 16beta2
Operating system:   Ubuntu 22.04
Description:        
Yesterday's test failure on prion:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2023-07-03%2010%3A13%3A03
made me wonder, what's going on there and whether it's yet another issue
with invalidating relcache (bug #17994).
(
SELECT schema_to_xmlschema('testxmlschema', false, true, '');
ERROR:  relation with OID 29598 does not exist
CONTEXT:  SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE
relnamespace = 29597 AND relkind IN ('r','m','v') AND
pg_catalog.has_table_privilege (oid, 'SELECT') ORDER BY relname;"
Other failures of that kind:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prion&dt=2023-06-20%2001%3A56%3A04&stg=check
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prion&dt=2023-04-15%2017%3A17%3A09&stg=check
)
I managed to construct a simple reproducer for the error:
for ((n=1;n<=30;n++)); do
echo "ITERATION $n"
numclients=30
for ((c=1;c<=$numclients;c++)); do
cat << EOF | psql >psql_$c.log &
CREATE SCHEMA testxmlschema_$c;
SELECT format('CREATE TABLE testxmlschema_$c.test_%s (a int);', g) FROM
generate_series(1, 30) g
\\gexec
SET parallel_setup_cost = 1;
SET min_parallel_table_scan_size = '1kB';
SELECT schema_to_xmlschema('testxmlschema_$c', true, false, '');
SELECT format('DROP TABLE testxmlschema_$c.test_%s', g) FROM
generate_series(1, 30) g
\\gexec
DROP SCHEMA testxmlschema_$c;
EOF
done
wait
grep 'ERROR:' server.log && break;
done
With a server compiled as follows:
CPPFLAGS="-O0 -DCATCACHE_FORCE_RELEASE" ./configure -q --enable-debug
--enable-cassert --enable-tap-tests --with-libxml && make ...
(More precisely, "#ifndef CATCACHE_FORCE_RELEASE" in ReleaseCatCache()
does matter here.)
I get errors as in the test in question:
...
ITERATION 9
ITERATION 10
ERROR:  relation with OID 59777 does not exist
CONTEXT:  parallel worker
SQL statement "SELECT oid FROM pg_catalog.pg_class WHERE relnamespace =
57162 AND relkind IN ('r','m','v') AND pg_catalog.has_table_privilege (oid,
'SELECT') ORDER BY relname;"
2023-07-04 12:48:14.205 MSK [3111661] ERROR:  relation with OID 59777 does
not exist
2023-07-04 12:48:14.206 MSK [3111598] ERROR:  relation with OID 59777 does
not exist
With a debug logging added in src/backend/utils/adt/acl.c, I see that
SearchSysCacheExists1(RELOID, ObjectIdGetDatum(tableoid) returns true in
has_table_privilege_id(), but later, in
pg_class_aclcheck()/pg_class_aclmask_ext(), 
SearchSysCache1(RELOID, ObjectIdGetDatum(table_oid)) returns NULL.
This is reproduced on REL_10_STABLE .. master. 
The first commit that demonstrates the issue is 61c2e1a95 (it improved
access to parallelism for SPI users, one of which is
schema_to_xmlschema_internal() (see also schema_get_xml_visible_tables())).
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2023-07-04 11:30:21 | Re: Backend handling replication slot stuck using 100% cpu, unkillable | 
| Previous Message | Sandeep Thakkar | 2023-07-04 10:54:47 | Re: BUG #18012: Installer fails to run .bat files when they are registered to Notepad++ |