Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Date: 2018-03-19 19:37:24
Message-ID: CAMa1XUhRYK1TO8TJoXagU-AM+evgOWWRnOEXC_sWeMX1i8NhCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Mon, Mar 19, 2018 at 1:17 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi Jeremy, Alvaro,
>
> On 2018-03-19 13:00:13 -0500, Jeremy Finzel wrote:
> > On Mon, Mar 19, 2018 at 12:46 PM, Alvaro Herrera <
> alvherre(at)alvh(dot)no-ip(dot)org>
> > wrote:
> >
> > > Jeremy Finzel wrote:
> > > > Getting some concerning errors in one of our databases that is on
> 9.5.11,
> > > > on autovacuum from template0 database pg_authid and
> pg_auth_members. I
> > > > only saw some notes on the list about this error related to
> materialized
> > > > views. FWIW, we did use pg_upgrade to upgrade this database from
> 9.4 to
> > > > 9.5. Here is an example:
> > > >
> > > > 2018-03-19 12:08:33.946 CDT,,,14892,,5aafee91.3a2c,1,,2018-03-19
> > > 12:08:33
> > > > CDT,59/340951,0,ERROR,XX001,"found xmin 2906288382 from before
> > > relfrozenxid
> > > > 740087784",,,,,"automatic vacuum of table
> > > > ""template0.pg_catalog.pg_authid""",,,,""
> > >
> > > Can you please supply output of pg_controldata?
>
> > Latest checkpoint's NextXID: 16/3132524419
> > Latest checkpoint's NextMultiXactId: 2142
> > Latest checkpoint's NextMultiOffset: 5235
> > Latest checkpoint's oldestXID: 1829964553
> > Latest checkpoint's oldestXID's DB: 12376
> > Latest checkpoint's oldestActiveXID: 3131774441
> > Latest checkpoint's oldestMultiXid: 1
> > Latest checkpoint's oldestMulti's DB: 16400
>
> Hm, based on these it doesn't look like multixacts were involved (based
> on oldestMultiXid it's highly unlikley there've multi wraparound, and
> there's not much multixact usage on system tables anyway). Which
> suggests that there might have been actual corrpution here.
>
> Jeremy:
> - which version of 9.4 and 9.5 ran on this? Do you know?
>

We upgraded to 9.5.5, and today we are running 9.5.11. And actually we
upgraded from 9.3, not 9.4. We are still trying to figure out which point
release we were on at 9.3.

> - Can you install the pageinspect extension? If so, it might be a
> CREATE EXTENSION pageinspect;
> CREATE OR REPLACE FUNCTION check_rel(rel regclass, OUT blockno int8, OUT
> lp int2, OUT xmin xid)
> RETURNS SETOF RECORD
> LANGUAGE SQL
> AS $$
> SELECT blockno, lp, t_xmin
> FROM
> generate_series(0, pg_relation_size($1::text) / 8192 - 1) blockno,
> -- every block in the relation
> heap_page_items(get_raw_page($1::text, blockno::int4)) -- every
> item on the page
> WHERE
> t_xmin IS NOT NULL -- filter out empty items
> AND t_xmin != 1 -- filter out bootstrap
> AND t_xmin != 2 -- filter out frozen transaction id
> AND (t_infomask & ((x'0100' | x'0200')::int)) != ((x'0100' |
> x'0200')::int) -- filter out frozen rows with xid present
> AND age(t_xmin) > age((SELECT relfrozenxid FROM pg_class WHERE oid
> = $1)) -- xid cutoff filter
> $$;
> SELECT * FROM check_rel('pg_authid') LIMIT 100;
>

Small note - Needs to be this because != is not supported for xid:

AND NOT t_xmin = 1 -- filter out bootstrap
AND NOT t_xmin = 2 -- filter out frozen transaction id

>
> and then display all items for one of the affected pages like
> SELECT * FROM heap_page_items(get_raw_page('pg_authid', 34343));
>
>
> Alvaro:
> - Hm, we talked about code adding context for these kind of errors,
> right? Is that just skipped for csvlog?
> - Alvaro, does the above check_rel() function make sense?
>
> Greetings,
>
> Andres Freund
>

The function does NOT show any issue with either of those tables.

One very interesting thing that is puzzling us - we have taken several san
snapshots of the system real time that are running on the exact same
version 9.5.11, and they do NOT show the same error when we vacuum these
tables. It makes us wonder if simply a db restart would solve the issue.

We will continue to investigate but interested in your feedback about what
we have seen thus far.

Thanks,
Jeremy

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-03-19 19:41:08 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Previous Message Adrian Klaver 2018-03-19 18:32:15 Re: PostgreSQL 9.6 Temporary files

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-03-19 19:41:08 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Previous Message Tom Lane 2018-03-19 19:29:06 Re: Compile error while building postgresql 10.3