Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date: 2015-05-09 12:00:49
Message-ID: CAEepm=3C32VPJLOo45y0c3-3KWXNV2xM4jaPTSVjCRD2VG0Qgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, May 9, 2015 at 2:46 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 8, 2015 at 9:55 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
>> Thomas Munro wrote:
>>> I think the fix is something like "if nextMXact == oldestMultiXactId,
>>> then there are no active multixacts, so the offsetStopLimit should be
>>> set to nextOffset - (a segment's worth)".
>>
>> Makes sense.
>
> Here's a patch that attempts to implement this.

Thanks. I think I have managed to reproduce something like the data
loss race that we were speculating about.

0. initdb, autovacuum = off, set up explode_mxact_members.c as
described elsewhere in the thread.
1. Fill up the members SLRU completely (ie reach state where you can
no longer create a new multixact of any size). pg_multixact/members
contains 82040 files and the last one is named 14077.
2. Issue CHECKPOINT, but use a debugger to stop inside
TruncateMultiXact after it has read
MultiXactState->lastCheckpointedOldest and released the lock, but
before it calls SlruScanDirectory to delete files...
3. Run VACUUM FREEZE in all databases (including template0). datminmxid moves.
4. Create lots of new multixacts. pg_multixact/members now contains
82041 files and the last one is named 14078 (ie one extra segment,
with the highest possible segment number, which couldn't be created
before vacuuming because of the one segment gap enforced by
DetermineSafeOldestOffset). Segments 0000-0016 have new modified
times.
5. ... allow the checkpoint started in step 2 to continue. It
deletes segments, keeping only 0000-0016. The segment 14078 which
contained active member data has been incorrectly deleted.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2015-05-09 12:43:49 Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Previous Message apolishc 2015-05-09 11:34:16 BUG #13258: pg_config shows wrong version