Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date: 2015-05-08 06:25:39
Message-ID: CA+TgmoZiHwybETx8NZzPtoSjprg2Kcr-NaWGajkzcLcbVJ1pKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, May 7, 2015 at 7:58 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Now I
>> understand the suggestion that the checkpoint code could be in charge
>> of advancing the oldest multixact + offset.
>
> Yeah. I think we need to pursue that angle unless somebody has a better idea.

So here's a patch that does that. It turns out to be pretty simple: I
just moved the DetermineSafeOldestOffset() calls around. As far as I
can see, and that may not be far enough at this hour of the morning,
we just need two: one in StartupMultiXact, so that we initialize the
values correctly after reading the control file; and then another at
the very end of TruncateMultiXact, so we update it after each
checkpoint or restartpoint. This leaves the residual problem that
autovacuum doesn't directly advance the stop point - the following
checkpoint does. We could handle that by requesting a checkpoint if
oldestMultiXactId is ahead of lastCheckpointedOldest by enough that a
checkpoint would free up some space, although one might hope that
people won't be living on the edge to quite that degree.

As things are, I believe this sequence is possible:

1. The members SLRU is full all the way up to offsetStopLimit.
2. A checkpoint occurs, reaching MultiXactSetSafeTruncate(), which
sets lastCheckpointedOldest.
3. Vacuum runs, calling SetMultiXactIdLimit(), calling
DetermineSafeOldestOffset(), advancing
MultiXactState->offsetStopLimit.
4. Since offsetStopLimit > lastCheckpointedOffset, it's now possible
for someone to consume an MXID greater than offsetStopLimit, making
MultiXactState->nextOffset > lastCheckpointedOffset
5. The checkpoint from step 1, continuing on its merry way, now calls
TruncateMultiXact(), which sets rangeEnd > rangeStart and blows away
nearly every file in the SLRU.

I haven't confirmed this yet, so I might still be all wet, especially
since it is late here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
multixact-truncate-race.patch binary/octet-stream 3.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alon 2015-05-08 21:05:10 Re: Re: [BUGS] Re: [BUGS] Re: [BUGS] Re: BUG #11431: Failing to backup and restore a Windows postgres database, with Norwegian Bokmål locale.
Previous Message Michael Paquier 2015-05-08 06:00:59 Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)