From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date: | 2020-09-12 22:00:07 |
Message-ID: | 665524.1599948007@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I have committed this version.
This failure says that the test case is not entirely stable:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-12%2005%3A13%3A12
diff -U3 /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out
--- /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out 2020-09-11 06:31:36.000000000 +0000
+++ /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out 2020-09-12 11:40:26.000000000 +0000
@@ -116,7 +116,6 @@
vacuum freeze htab2;
-- unused TIDs should be skipped
select heap_force_kill('htab2'::regclass, ARRAY['(0, 2)']::tid[]);
- NOTICE: skipping tid (0, 2) for relation "htab2" because it is marked unused
heap_force_kill
-----------------
sungazer's first run after pg_surgery went in was successful, so it's
not a hard failure. I'm guessing that it's timing dependent.
The most obvious theory for the cause is that what VACUUM does with
a tuple depends on whether the tuple's xmin is below global xmin,
and a concurrent autovacuum could very easily be holding back global
xmin. While I can't easily get autovac to run at just the right
time, I did verify that a concurrent regular session holding back
global xmin produces the symptom seen above. (To replicate, insert
"select pg_sleep(...)" in heap_surgery.sql before "-- now create an unused
line pointer"; run make installcheck; and use the delay to connect
to the database manually, start a serializable transaction, and do
any query to acquire a snapshot.)
I suggest that the easiest way to make this test reliable is to
make the test tables be temp tables (which allows dropping the
autovacuum_enabled = off property, too). In the wake of commit
a7212be8b, that should guarantee that vacuum has stable tuple-level
behavior regardless of what is happening concurrently.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2020-09-13 06:22:14 | How to get position in array with JSONPath |
Previous Message | Peter Eisentraut | 2020-09-12 18:37:52 | Re: Missing "Up" navigation link between parts and doc root? |