Bug: Unreferenced temp tables disables vacuum to update xid

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-06 08:40:56
Message-ID: 20080106004056.7e7e2a2d@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I spent the better part of this evening tracking down a problem with a
high velocity database. The database had entered the point of no return
by invoking xidStopLimit.

This by itself isn't a problem because you just vacuum right? Well we
vacuumed... and the problem didn't resolve itself. It continued to
throw the warning:

FATAL: database is not accepting commands to avoid wraparound
data loss in database "foo"
HINT: Stop the postmaster and use a standalone backend to vacuum
database "foo".

And when in --single with postgres we would get:

2008-01-06 02:04:45 EST WARNING: database "foo" must be vacuumed
within 993712 transactions
2008-01-06 02:04:45 EST HINT: To avoid a database shutdown,
execute a full-database VACUUM in "foo".

We performed all the requisite queries to determine where the problem
was:

SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';

Everything returned ~ 50 mil

But:

SELECT datname, age(datfrozenxid) FROM pg_database;

Always returned ~ 2bil.

Even after two vacuums (one a vacuum and the other a vacuum analyze).

Anyway.. we tried a lot of different things, including adjusting
xidStopLimit so we could get back into interactive mode and have a
reasonable interface to work with...

The end result was that by chance we checked relkind = 't' instead of
'r' (Shout out to AndrewSN). And sure enough:

pg_toast_49013869 | 2146491285

And yes:

SELECT oid::regclass FROM pg_class WHERE
reltoastrelid='pg_toast.pg_toast_49013869'::regclass;

oid | pg_temp_24.tmp_isp_blk_chk

The hack to get this cleaned up was to connect about 2 dozen times (to
get to slot 24) with psql via different sessions and create temp
tables. Once we hit slot 24, the probably instantly went away and the
database returned to normal state.

May I humbly suggest that a:

* We need to check clean up unreferenced temp relations on startup and
remove them

* We need to change the docs for the following query:

SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';

To:

SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' OR
relkind = 't';

I apologize if this doesn't quite make sense. I am very tired but I
wanted to make sure to get this out on the list.

Sincerely,

Joshua D. Drake

- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHgJQaATb/zqfZUUQRAnKAAJ0fs0OahvGjlJq6fWrFZ67h1tY6qwCfcHmR
K0xOKL+JMAcPTQGbqR3qy1M=
=te9S
-----END PGP SIGNATURE-----

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sokolov Yura 2008-01-06 08:57:03 BUG #3852: Could not create complex aggregate
Previous Message Gokulakannan Somasundaram 2008-01-06 08:24:02 Re: Dynamic Partitioning using Segment Visibility Maps