Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date: 2015-05-08 23:41:38
Message-ID: CAEepm=1XGJVijxqG2EE=3Tb2bbrQRTvnXA6vZN1FkOZNtH=Lqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, May 8, 2015 at 6:25 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 1. The members SLRU is full all the way up to offsetStopLimit.
> 2. A checkpoint occurs, reaching MultiXactSetSafeTruncate(), which
> sets lastCheckpointedOldest.
> 3. Vacuum runs, calling SetMultiXactIdLimit(), calling
> DetermineSafeOldestOffset(), advancing
> MultiXactState->offsetStopLimit.
> 4. Since offsetStopLimit > lastCheckpointedOffset, it's now possible
> for someone to consume an MXID greater than offsetStopLimit, making
> MultiXactState->nextOffset > lastCheckpointedOffset
> 5. The checkpoint from step 1, continuing on its merry way, now calls
> TruncateMultiXact(), which sets rangeEnd > rangeStart and blows away
> nearly every file in the SLRU.

I am still working on reproducing this race scenario various different
ways including the way you described, but at step 4 I kept getting
stuck, unable to create new multixacts despite having vacuum-frozen
all databases (including template0) and advanced the cluster minimum
mxid.

I think I see why, and I think it's a bug: if you vacuum freeze all
your databases, MultiXactState->oldestMultiXactId finishes up equal to
MultiXactState->nextMXact. But that's not actually a multixact that
exists yet, so when when DetermineSafeOldestOffset calls
find_multixact_start, it reads a garbage offset (all zeros in practice
since pages start out zeroed) and produces a garbage value for
offsetStopLimit which might incorrectly stop you from creating any
more multixacts even though member space is entirely empty (but it
depends on where your nextOffset happens to be at the time). I think
the fix is something like "if nextMXact == oldestMultiXactId, then
there are no active multixacts, so the offsetStopLimit should be set
to nextOffset - (a segment's worth)".

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2015-05-09 01:55:28 Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Previous Message Alon 2015-05-08 21:21:57 Re: Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.