| From: | Noah Misch <noah(at)leadboat(dot)com> |
|---|---|
| To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: race condition in pg_class |
| Date: | 2026-02-16 17:03:26 |
| Message-ID: | 20260216170326.af@rfd.leadboat.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Feb 16, 2026 at 08:00:00AM +0200, Alexander Lakhin wrote:
> 20.07.2024 11:00, Alexander Lakhin wrote:
> > 28.06.2024 08:13, Noah Misch wrote:
> > > Pushed.
> Could you please look at one more interesting failure produced by
> 001_pgbench_with_server.pl [1]?
> regress_log_001_pgbench_with_server:
> [13:11:27.325](0.001s) ok 3 - concurrent OID generation stderr /(?^:^$)/
> # Running: pgbench ...
> [13:11:29.481](2.156s) not ok 4 - concurrent GRANT/VACUUM status (got 2 vs expected 0) # TODO PROC_IN_VACUUM scan breakage
> [13:11:29.483](0.002s) # Failed (TODO) test 'concurrent GRANT/VACUUM status (got 2 vs expected 0)'
> # at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 77.
> [13:11:29.484](0.001s) not ok 5 - concurrent GRANT/VACUUM stdout
> /(?^:processed: 250/250)/ # TODO PROC_IN_VACUUM scan breakage
> [13:11:29.485](0.001s) # Failed (TODO) test 'concurrent GRANT/VACUUM stdout /(?^:processed: 250/250)/'
> ...
> [13:11:29.486](0.001s) not ok 6 - concurrent GRANT/VACUUM stderr /(?^:^$)/ # TODO PROC_IN_VACUUM scan breakage
> [13:11:29.486](0.000s) # Failed (TODO) test 'concurrent GRANT/VACUUM stderr /(?^:^$)/'
> # at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 77.
> [13:11:29.487](0.001s) # 'pgbench: error: client 1 script
> 1 aborted in command 0 query 0: ERROR: relation 266643 deleted while still
> in use
> # pgbench: error: Run was aborted; the above results are incomplete.
> # '
> # doesn't match '(?^:^$)'
>
> 001_pgbench_with_server_main.log contains:
> 2026-02-12 13:11:28.603 UTC [6012:36] 001_pgbench_with_server.pl ERROR: relation 266643 deleted while still in use
> 2026-02-12 13:11:28.603 UTC [6012:37] 001_pgbench_with_server.pl STATEMENT: VACUUM ddl_target;
>
> I'm able to reproduce this error with:
> numcouples=40
> for ((j=1;j<=numcouples;j++)); do
> createdb db$j
> echo "CREATE TABLE t(i int);" | psql -d db$j
> done
>
> for ((i=1;i<=1000;i++)); do
> echo "iteration $i"
> for ((j=1;j<=numcouples;j++)); do
> for ((k=1;k<=100;k++)); do echo "GRANT SELECT ON t TO public /* $k */;"; done | psql -d db$j >psql-grant-$j.log 2>&1 &
> for ((k=1;k<=10;k++)); do echo "VACUUM t /* $k */;"; done | psql -d db$j >psql-vacuum-$j.log 2>&1 &
> done
> wait
> grep -E 'ERROR: ' server.log && break;
> done
>
> This fails for me as below:
> ...
> iteration 47
> 2026-02-16 07:13:14.855 EET|law|db13|6992a76a.a6983|ERROR: relation 16434 deleted while still in use
>
> ...
> iteration 6
> 2026-02-16 07:13:42.537 EET|law|db20|6992a786.ab3bc|ERROR: pg_class entry for relid 16462 vanished during vacuuming
>
> ...
> iteration 7
> 2026-02-16 07:14:01.182 EET|law|db2|6992a799.ad54f|ERROR: could not open relation with OID 16390
>
> ...
> iteration 9
> 2026-02-16 07:14:26.160 EET|law|db12|6992a7b2.aedf8|ERROR: relation 16430 deleted while still in use
> ...
These symptoms are consistent with the "PROC_IN_VACUUM scan breakage" bug.
It's good to have an additional recipe for reproducing that bug, so I've
linked to your message from the PROC_IN_VACUUM entry at
https://wiki.postgresql.org/wiki/User:Nmisch/Wanted
Since the PROC_IN_VACUUM failures are in tests marked TODO, they don't make
the test fail. I think this particular buildfarm link is a case of:
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#Miscellaneous_tests_fail_on_Windows_due_to_a_connection_closed_before_receiving_a_final_error_message
Here's the non-TODO failure in that log:
[13:11:30.050](0.002s) not ok 10 - no such database stderr /(?^:FATAL: database "no-such-database" does not exist)/
[13:11:30.051](0.001s) # Failed test 'no such database stderr /(?^:FATAL: database "no-such-database" does not exist)/'
# at C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql/src/bin/pgbench/t/001_pgbench_with_server.pl line 100.
[13:11:30.051](0.000s) # 'pgbench: error: connection to server on socket "C:/tools/xmsys64/tmp/P5YfRmVxhI/.s.PGSQL.10766" failed: server closed the connection unexpectedly
# This probably means the server terminated abnormally
# before or while processing the request.
# pgbench: error: could not create connection for setup
# '
# doesn't match '(?^:FATAL: database "no-such-database" does not exist)'
The server log has the expected message that pgbench didn't receive:
2026-02-12 13:11:29.662 UTC [8948:1] [unknown] LOG: connection received: host=[local]
2026-02-12 13:11:29.664 UTC [8948:2] [unknown] LOG: connection authenticated: user="pgrunner" method=trust (C:/tools/xmsys64/home/pgrunner/bf/root/REL_18_STABLE/pgsql.build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/pgdata/pg_hba.conf:117)
2026-02-12 13:11:29.664 UTC [8948:3] [unknown] LOG: connection authorized: user=pgrunner database=no-such-database application_name=001_pgbench_with_server.pl
2026-02-12 13:11:29.664 UTC [8948:4] [unknown] FATAL: database "no-such-database" does not exist
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-02-16 17:04:58 | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Previous Message | Tom Lane | 2026-02-16 16:50:34 | Re: [PATCH] pgindent truncates last line of files missing a trailing newline |