Re: ERROR: multixact X from before cutoff Y found to be still running

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Jeremy Schneider <schnjere(at)amazon(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>
Subject: Re: ERROR: multixact X from before cutoff Y found to be still running
Date: 2019-09-05 04:01:55
Message-ID: CA+hUKGLXna28c_skfJFtCMc3BDrVbSn+EK_kLgFE=ACPKdGnTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 5, 2019 at 1:01 PM Jeremy Schneider <schnjere(at)amazon(dot)com> wrote:
> On 9/4/19 17:37, Nathan Bossart wrote:
> Currently, if you hold a multixact open long enough to generate an
> "oldest multixact is far in the past" message during VACUUM, you may
> see the following ERROR:
>
> WARNING: oldest multixact is far in the past
> HINT: Close open transactions with multixacts soon to avoid wraparound problems.
> ERROR: multixact X from before cutoff Y found to be still running
>
> Upon further inspection, I found that this is because the multixact
> limit used in this case is the threshold for which we emit the "oldest
> multixact" message. Instead, I think the multixact limit should be
> set to the result of GetOldestMultiXactId(), effectively forcing a
> minimum freeze age of zero. The ERROR itself is emitted by
> FreezeMultiXactId() and appears to be a safeguard against problems
> like this.
>
> I've attached a patch to set the limit to the oldest multixact instead
> of the "safeMxactLimit" in this case. I'd like to credit Jeremy
> Schneider as the original reporter.
>
> This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby and myself today. A rather large PostgreSQL database with default autovacuum settings had a large table that started getting behind on Sunday. The server has a fairly large number of CPUs and a respectable workload. We realized today that with their XID generation they would go read-only to prevent wraparound tomorrow. (And perfectly healthy XID age on Sunday - that's wraparound in four days! Did I mention that I'm excited for the default limit GUC change in pg12?) To make matters more interesting, whenever we attempted to run a VACUUM command we encountered the ERROR message that Nate quoted on every single attempt! There was a momentary mild panic based on the "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId() ... but as we looked closer we're now thinking there might just be an obscure bug in the code that sets vacuum limits.
>
> Nathan and Nasby and myself have been chatting about this for quite awhile but the vacuum code isn't exactly the simplest thing in the world to reason about. :) Anyway, it looks to me like MultiXactMemberFreezeThreshold() is intended to progressively reduce the vacuum multixact limits across multiple vacuum runs on the same table, as pressure on the members space increases. I'm thinking there was just a small oversight in writing the formula where under the most aggressive circumstances, vacuum could actually be instructed to delete multixacts that are still in use by active transactions and trigger the failure we observed.
>
> Nate put together an initial patch (attached to the previous email, which was sent only to the bugs list). We couldn't quite come to a consensus and on the best approach, but we decided that he'd kick of the thread and I'd throw out an alternative version of the patch that might be worth discussion. [Attached to this email.] Curious what others think!

Hi Jeremy, Nathan, Jim,

Ok, so to recap... since commit 801c2dc7 in 2014, if the limit was
before the 'safe' limit, then it would log the warning and start using
the safe limit, even if that was newer than a multixact that is *still
running*. It's not immediately clear to me if the limits on the
relevant GUCs or anything else ever prevented that.

Then commit 53bb309d2d5 came along in 2015 (to fix a bug: member's
head could overwrite its tail) and created a way for the safe limit to
be more aggressive. When member space is low, we start lowering the
effective max freeze age, and as we do so the likelihood of crossing
into still-running-multixact territory increases.

I suppose this requires you to run out of member space (for example
many backends key sharing the same FK) or maybe just set
autovacuum_multixact_freeze_max_age quite low, and then prolong the
life of a multixact for longer. Does the problem fix itself once you
close the transaction that's in the oldest multixact, ie holding back
GetOldestMultiXact() from advancing? Since VACUUM errors out, we
don't corrupt data, right? Everyone else is still going to see the
multixact as running and do the right thing because vacuum never
manages to (bogusly) freeze the tuple.

Both patches prevent mxactLimit from being newer than the oldest
running multixact. The v1 patch uses the most aggressive setting
possible: the oldest running multi; the v2 uses the least aggressive
of the 'safe' and oldest running multi. At first glance it seems like
the second one is better: it only does something different if we're in
the dangerous scenario you identified, but otherwise it sticks to the
safe limit, which generates less IO.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-09-05 11:17:22 BUG #15992: Index size larger than the base table size. Sometime 3 times large
Previous Message Jeremy Schneider 2019-09-05 01:01:05 Re: ERROR: multixact X from before cutoff Y found to be still running

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2019-09-05 04:22:33 Re: Add "password_protocol" connection parameter to libpq
Previous Message Robert Haas 2019-09-05 03:25:46 Re: block-level incremental backup