Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, MBeena Emerson <mbeena(dot)emerson(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-09-14 10:26:07
Message-ID: CAE9k0P=9Lu6GWFsWmBDEz4H6sfn=cfRmHCBgxzeihpKn4CNYpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 13, 2020 at 3:30 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I have committed this version.
>
> This failure says that the test case is not entirely stable:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-12%2005%3A13%3A12
>
> diff -U3 /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out
> --- /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/expected/heap_surgery.out 2020-09-11 06:31:36.000000000 +0000
> +++ /home/nm/farm/gcc64/HEAD/pgsql.build/contrib/pg_surgery/results/heap_surgery.out 2020-09-12 11:40:26.000000000 +0000
> @@ -116,7 +116,6 @@
> vacuum freeze htab2;
> -- unused TIDs should be skipped
> select heap_force_kill('htab2'::regclass, ARRAY['(0, 2)']::tid[]);
> - NOTICE: skipping tid (0, 2) for relation "htab2" because it is marked unused
> heap_force_kill
> -----------------
>
>
> sungazer's first run after pg_surgery went in was successful, so it's
> not a hard failure. I'm guessing that it's timing dependent.
>
> The most obvious theory for the cause is that what VACUUM does with
> a tuple depends on whether the tuple's xmin is below global xmin,
> and a concurrent autovacuum could very easily be holding back global
> xmin. While I can't easily get autovac to run at just the right
> time, I did verify that a concurrent regular session holding back
> global xmin produces the symptom seen above. (To replicate, insert
> "select pg_sleep(...)" in heap_surgery.sql before "-- now create an unused
> line pointer"; run make installcheck; and use the delay to connect
> to the database manually, start a serializable transaction, and do
> any query to acquire a snapshot.)
>

Thanks for reporting. I'm able to reproduce the issue by creating some
delay just before "-- now create an unused line pointer" and use the
delay to start a new session either with repeatable read or
serializable transaction isolation level and run some query on the
test table. To fix this, as you suggested I've converted the test
table to the temp table. Attached is the patch with the changes.
Please have a look and let me know about any concerns.

Thanks,

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachment Content-Type Size
fix_regression_pg_surgey.patch text/x-patch 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-14 11:19:53 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Surafel Temesgen 2020-09-14 10:04:56 Re: pg_dump --where option