Why is lorikeet so unstable in v14 branch only?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Why is lorikeet so unstable in v14 branch only?
Date: 2022-03-26 18:47:07
Message-ID: 136102.1648320427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I chanced to notice that buildfarm member lorikeet has been
failing an awful lot lately in the v14 branch, but hardly
at all in other branches. Here's a log extract from its
latest run [1]:

2022-03-26 06:31:47.245 EDT [623eeb93.d202:131] pg_regress/inherit LOG: statement: create table mlparted_tab (a int, b char, c text) partition by list (a);
2022-03-26 06:31:47.247 EDT [623eeb93.d202:132] pg_regress/inherit LOG: statement: create table mlparted_tab_part1 partition of mlparted_tab for values in (1);
2022-03-26 06:31:47.254 EDT [623eeb93.d203:60] pg_regress/vacuum LOG: statement: VACUUM FULL pg_class;
2022-03-26 06:31:47.258 EDT [623eeb92.d201:90] pg_regress/typed_table LOG: statement: SELECT a.attname,
pg_catalog.format_type(a.atttypid, a.atttypmod),
(SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid, true)
FROM pg_catalog.pg_attrdef d
WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
a.attnotnull,
(SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t
WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation) AS attcollation,
a.attidentity,
a.attgenerated
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = '21770' AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;
*** starting debugger for pid 53762, tid 10536
2022-03-26 06:32:02.158 EDT [623eeb6c.d0c2:4] LOG: server process (PID 53762) exited with exit code 127
2022-03-26 06:32:02.158 EDT [623eeb6c.d0c2:5] DETAIL: Failed process was running: create table mlparted_tab_part1 partition of mlparted_tab for values in (1);
2022-03-26 06:32:02.158 EDT [623eeb6c.d0c2:6] LOG: terminating any other active server processes

The failures are not all exactly like this one, but they're mostly in
CREATE TABLE operations nearby to this one. I speculate what is happening
is that the "VACUUM FULL pg_class" is triggering some misbehavior in
concurrent partitioned-table creation. The lack of failures in other
branches could be due to changes in the relative timing of the "vacuum"
and "inherit" test scripts.

Any chance we could get a stack trace from one of these crashes?

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2022-03-26%2010%3A17%3A22

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-03-26 18:49:44 Re: Pointer subtraction with a null pointer
Previous Message Andres Freund 2022-03-26 18:41:47 Re: Pointer subtraction with a null pointer