Could not finish anti-wraparound VACUUM when stop limit is reached

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Could not finish anti-wraparound VACUUM when stop limit is reached
Date: 2014-05-25 22:37:03
Message-ID: CAMkU=1z=z8+0-bHbQuhV62H+AoMABzJbdiofqj=zaiRkAd3VLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sunday, May 25, 2014, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com<javascript:_e(%7B%7D,'cvml','hlinnakangas(at)vmware(dot)com');>>
wrote:

> While debugging the B-tree bug that Jeff Janes reported (
> http://www.postgresql.org/message-id/CAMkU=1y=VwF07Ay+Cpqk_
> 7FpiHRctmssV9y99SBGhitkXPbf8g(at)mail(dot)gmail(dot)com), a new issue came up:
>
> If you reach the xidStopLimit, and try to run VACUUM, it fails with error:
>
> jjanes=# vacuum;
> ERROR: database is not accepting commands to avoid wraparound data loss
> in database "jjanes"
> HINT: Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
>

This problem also afflicted me in 9.3 and 9.2 (and probably existed further
back too). I figured it was mostly a barrier to more effective testing,
but it would be nice to have it fixed.

But I don't understand how you encountered this. I only ran into it when
the vacuum had already been started, but not yet completed, by the time the
limit was reached. Once it is already reached, how do you even get the
vacuum to start? Doesn't it error out right at the beginning.

Jeff's database seems to have wrapped around already, because after fixing
> the above, I get this:
>

Do you have the patch to fix this?

>
> jjanes=# vacuum;
> WARNING: some databases have not been vacuumed in over 2 billion
> transactions
> DETAIL: You might have already suffered transaction-wraparound data loss.
> VACUUM
>

This is odd. When I apply your patch from the other thread to fix the
vacuum, and then start up in single-user mode, I can run vacuum to
completion and re-open the database. When I first start it up, it says it
needs to be vacuumed within 999,935 transactions. There is no indication
that it has already suffered a wrap around, just that it was about to do so.

> We do not truncate clog when wraparound has already happened, so we never
> get past that point. Jeff advanced XID counter aggressively with some
> custom C code, so hitting the actual wrap-around is a case of "don't do
> that". Still, the case is quite peculiar: pg_controldata says that nextXid
> is 4/1593661139. The oldest datfrozenxid is equal to that, 1593661139. So
> ISTM he managed to not just wrap around, but execute 2 billion more
> transactions after the wraparound and reach datfrozenxid again. I'm not
> sure how that happened.
>

If it had actually undergone an undetected wraparound, wouldn't data be
disappearing and appearing inappropriately? I think the testing harness
should have detected that inconsistency.

(Also, the max setting for JJ_xid during the test run was 40, so I don't
think it could have blown right past the 1,000,000 safety margin and out
the other side without triggering a shutdown).

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Ross 2014-05-25 22:59:24 Re: pg_upgrade fails: Mismatch of relation OID in database 8.4 -> 9.3
Previous Message Heikki Linnakangas 2014-05-25 22:22:11 Re: 9.4 btree index corruption