Re: recovering from "found xmin ... from before relfrozenxid ..."

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."
Date: 2020-07-24 09:05:08
Message-ID: CAE9k0PnRhymKiyaxUE8u=6_4=K_1ktKjOVb51K_H2eGmL81f1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

Attached is the patch that adds heap_force_kill(regclass, tid[]) and
heap_force_freeze(regclass, tid[]) functions which Robert mentioned in the
first email in this thread. The patch basically adds an extension named
pg_surgery that contains these functions. Please have a look and let me
know your feedback. Thank you.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Thu, Jul 16, 2020 at 9:44 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Jul 16, 2020 at 10:00 AM Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > I see your point, though: the tuple has to be able to survive
> > HOT-pruning in order to cause a problem when we check whether it needs
> > freezing.
>
> Here's an example where the new sanity checks fail on an invisible
> tuple without any concurrent transactions:
>
> $ initdb
> $ pg_ctl start -l ~/logfile
> $ createdb
> $ psql
>
> create table simpsons (a int, b text);
> vacuum freeze;
>
> $ cat > txid.sql
> select txid_current();
> $ pgbench -t 131072 -c 8 -j 8 -n -f txid.sql
> $ psql
>
> insert into simpsons values (1, 'homer');
>
> $ pg_ctl stop
> $ pg_resetwal -x 1000 $PGDATA
> $ pg_ctl start -l ~/logfile
> $ psql
>
> update pg_class set relfrozenxid = (relfrozenxid::text::integer +
> 2000000)::text::xid where relname = 'simpsons';
>
> rhaas=# select * from simpsons;
> a | b
> ---+---
> (0 rows)
>
> rhaas=# vacuum simpsons;
> ERROR: found xmin 1049082 from before relfrozenxid 2000506
> CONTEXT: while scanning block 0 of relation "public.simpsons"
>
> This is a fairly insane situation, because we should have relfrozenxid
> < tuple xid < xid counter, but instead we have xid counter < tuple xid
> < relfrozenxid, but it demonstrates that it's possible to have a
> database which is sufficiently corrupt that you can't escape from the
> new sanity checks using only INSERT, UPDATE, and DELETE.
>
> Now, an even easier way to create a table with a tuple that prevents
> vacuuming and also can't just be deleted is to simply remove a
> required pg_clog file (and maybe restart the server to clear out any
> cached data in the SLRUs). What we typically do with customers who
> need to recover from that situation today is give them a script to
> fabricate a bogus CLOG file that shows all transactions as committed
> (or, perhaps, aborted). But I think that the tools proposed on this
> thread might be a better approach in certain cases. If the problem is
> that a pg_clog file vanished, then recreating it with whatever content
> you think is closest to what was probably there before is likely the
> best you can do. But if you've got some individual tuples with crazy
> xmin values, you don't really want to drop matching files in pg_clog;
> it's better to fix the tuples.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

Attachment Content-Type Size
0001-Add-contrib-pg_surgery-to-perform-surgery-on-the-dam.patch text/x-patch 19.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-07-24 09:13:28 Re: renaming configure.in to configure.ac
Previous Message Tomas Vondra 2020-07-24 08:40:47 Re: Default setting for enable_hashagg_disk