Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-07-16 16:14:35
Message-ID: CA+TgmoZka+OZU2wQm1u+xba8176+jvpgH5PhD4ec6xd9Du+B0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I see your point, though: the tuple has to be able to survive
> HOT-pruning in order to cause a problem when we check whether it needs
> freezing.

Here's an example where the new sanity checks fail on an invisible
tuple without any concurrent transactions:

$ initdb
$ pg_ctl start -l ~/logfile
$ createdb
$ psql

create table simpsons (a int, b text);
vacuum freeze;

$ cat > txid.sql
select txid_current();
$ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql
$ psql

insert into simpsons values (1, 'homer');

$ pg_ctl stop
$ pg_resetwal -x 1000 $PGDATA
$ pg_ctl start -l ~/logfile
$ psql

update pg_class set relfrozenxid = (relfrozenxid::text::integer +
2000000)::text::xid where relname = 'simpsons';

rhaas=# select * from simpsons;
a | b
---+---
(0 rows)

rhaas=# vacuum simpsons;
ERROR: found xmin 1049082 from before relfrozenxid 2000506
CONTEXT: while scanning block 0 of relation "public.simpsons"

This is a fairly insane situation, because we should have relfrozenxid
< tuple xid < xid counter, but instead we have xid counter < tuple xid
< relfrozenxid, but it demonstrates that it's possible to have a
database which is sufficiently corrupt that you can't escape from the
new sanity checks using only INSERT, UPDATE, and DELETE.

Now, an even easier way to create a table with a tuple that prevents
vacuuming and also can't just be deleted is to simply remove a
required pg_clog file (and maybe restart the server to clear out any
cached data in the SLRUs). What we typically do with customers who
need to recover from that situation today is give them a script to
fabricate a bogus CLOG file that shows all transactions as committed
(or, perhaps, aborted). But I think that the tools proposed on this
thread might be a better approach in certain cases. If the problem is
that a pg_clog file vanished, then recreating it with whatever content
you think is closest to what was probably there before is likely the
best you can do. But if you've got some individual tuples with crazy
xmin values, you don't really want to drop matching files in pg_clog;
it's better to fix the tuples.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-16 16:17:15 Re: renaming configure.in to configure.ac
Previous Message Tom Lane 2020-07-16 15:57:21 Re: Volatile Functions in Parallel Plans