Autovacuum to prevent wraparound tries to consume xid

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Autovacuum to prevent wraparound tries to consume xid
Date: 2016-03-28 11:05:12
Message-ID: CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

one our customer meet near xid wraparound situation. xid counter
reached xidStopLimit value. So, no transactions could be executed in
normal mode. But what I noticed is strange behaviour of autovacuum to
prevent wraparound. It vacuums tables, updates pg_class and pg_database,
but then falls with "database is not accepting commands to avoid wraparound
data loss in database" message. We end up with situation that according to
pg_database maximum age of database was less than 200 mln., but
transactions couldn't be executed, because ShmemVariableCache wasn't
updated (checked by gdb).

I've reproduced this situation on my laptop as following:

1) Connect gdb, do "set ShmemVariableCache->nextXid =
ShmemVariableCache->xidStopLimit"
2) Stop postgres
3) Make some fake clog: "dd bs=1m if=/dev/zero
of=/usr/local/pgsql/data/pg_clog/07FF count=1024"
4) Start postgres

Then I found the same situation as in customer database. Autovacuum to
prevent wraparound regularly produced following messages in the log:

ERROR: database is not accepting commands to avoid wraparound data loss in
database "template1"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.

Finally all databases was frozen

# SELECT datname, age(datfrozenxid) FROM pg_database;
datname │ age
───────────┼──────────
template1 │ 0
template0 │ 0
postgres │ 50000000
(3 rows)

but no transactions could be executed (ShmemVariableCache wasn't updated).

After some debugging I found that vac_truncate_clog consumes xid just to
produce warning. I wrote simple patch which replaces
GetCurrentTransactionId() with ShmemVariableCache->nextXid. That
completely fixes this situation for me: ShmemVariableCache was successfully
updated.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
fix_vac_truncate_clog_xid_consume.patch application/octet-stream 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-03-28 11:45:32 Re: WIP: Access method extendability
Previous Message Thomas Kellerer 2016-03-28 10:36:10 Re: Draft release notes for next week's releases