Re: Autovacuum to prevent wraparound tries to consume xid

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum to prevent wraparound tries to consume xid
Date: 2016-05-22 09:39:05
Message-ID: CAA4eK1JoYcGrsTCh53PPOpvneeUga=kD8gx-VnAvDJeWrOYV6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 28, 2016 at 4:35 PM, Alexander Korotkov <
a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> Hackers,
>
> one our customer meet near xid wraparound situation. xid counter
> reached xidStopLimit value. So, no transactions could be executed in
> normal mode. But what I noticed is strange behaviour of autovacuum to
> prevent wraparound. It vacuums tables, updates pg_class and pg_database,
> but then falls with "database is not accepting commands to avoid wraparound
> data loss in database" message. We end up with situation that according to
> pg_database maximum age of database was less than 200 mln., but
> transactions couldn't be executed, because ShmemVariableCache wasn't
> updated (checked by gdb).
>
> I've reproduced this situation on my laptop as following:
>
> 1) Connect gdb, do "set ShmemVariableCache->nextXid =
> ShmemVariableCache->xidStopLimit"
> 2) Stop postgres
> 3) Make some fake clog: "dd bs=1m if=/dev/zero
> of=/usr/local/pgsql/data/pg_clog/07FF count=1024"
> 4) Start postgres
>
> Then I found the same situation as in customer database. Autovacuum to
> prevent wraparound regularly produced following messages in the log:
>
> ERROR: database is not accepting commands to avoid wraparound data loss
> in database "template1"
> HINT: Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
>
> Finally all databases was frozen
>
> # SELECT datname, age(datfrozenxid) FROM pg_database;
> datname │ age
> ───────────┼──────────
> template1 │ 0
> template0 │ 0
> postgres │ 50000000
> (3 rows)
>
> but no transactions could be executed (ShmemVariableCache wasn't updated).
>
> After some debugging I found that vac_truncate_clog consumes xid just to
> produce warning. I wrote simple patch which replaces
> GetCurrentTransactionId() with ShmemVariableCache->nextXid. That
> completely fixes this situation for me: ShmemVariableCache was successfully
> updated.
>

As per your latest patch, you are using ReadNewTransactionId() to get the
nextXid which then is used to check if any database's frozenxid is already
wrapped. Now, isn't the value of nextXID in your patch same as
lastSaneFrozenXid in most cases (I mean there is a small window where some
new transaction might have started due to which the value of
ShmemVariableCache->nextXid has been advanced)? So isn't relying on
lastSaneFrozenXid check sufficient?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2016-05-22 09:42:42 [sqlsmith] PANIC: failed to add BRIN tuple
Previous Message Andreas Seltenreich 2016-05-22 09:16:47 [sqlsmith] Failed assertions on parallel worker shutdown