Re: ERROR: multixact X from before cutoff Y found to be still running

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: "Schneider, Jeremy" <schnjere(at)amazon(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Nasby, Jim" <nasbyj(at)amazon(dot)com>
Subject: Re: ERROR: multixact X from before cutoff Y found to be still running
Date: 2019-09-06 17:25:36
Message-ID: CA+TgmoYdT9uxd1wF1Vvz1wHFfirXju9pC9GspzpRUa=1shqC7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Sep 5, 2019 at 4:08 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
> Right, the v2 patch will effectively ramp-down the freezemin as your
> freeze_max_age gets smaller, while the v1 patch will set the effective
> freezemin to zero as soon as your multixact age passes the threshold.
> I think what is unclear to me is whether this ramp-down behavior is
> the intended functionality or we should be doing something similar to
> what we do for regular transaction IDs (i.e. force freezemin to zero
> right after it hits the "oldest xmin is far in the past" threshold).
> The comment above MultiXactMemberFreezeThreshold() explains things
> pretty well, but AFAICT it is more geared towards influencing
> autovacuum scheduling. I agree that v2 is safer from the standpoint
> that it changes as little as possible, though.

I don't presently have a view on fixing the actual but here, but I can
certainly confirm that I intended MultiXactMemberFreezeThreshold() to
ratchet up the pressure gradually rather than all at once, and my
suspicion is that this behavior may be good to retain, but I'm not
sure.

One difference between regular XIDs and MultiXacts is that there's
only one reason why we can need to vacuum XIDs, but there are two
reasons why we can need to vacuum MultiXacts. We can either be
running short of members space or we can be running short of offset
space, and running out of either one is bad. Regular XIDs have no
analogue of this problem: there's only one thing that you can exhaust.
At the time I wrote MultiXactMemberFreezeThreshold(), only the
'offsets' array had any sort of wraparound protection, and it was
space in 'offsets' that was measured by relminmxid, datminmxid, etc.
You could imagine having separate catalog state to track space in the
'members' SLRU, e.g. relminmxidmembers, datminmxidmembers, etc., but
that wasn't really an option for fixing the bug at hand, because it
wouldn't have been back-patchable.

So the challenge was to find some way of using the existing catalog
state to try to provide wraparound protection for a new kind of thing
for which wraparound protection had not been previously contemplated.
And so MultiXactMemberFreezeThreshold() was born.

(I apologize if any of the above sounds like I'm talking credit for
work actually done by Thomas, who I see is listed as the primary
author of the commit in question. I feel like I invented
MultiXactMemberFreezeThreshold and the big comment at the top of it
looks to me like something I wrote, but but this was a long time ago
and I don't really remember who did what. My intent here is to provide
some context that may be useful based on what I remember about that
patch, not to steal anybody's thunder.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dave Cramer 2019-09-06 20:39:15 Re: BUG #15808: ERROR: subtransaction logged without previous top-level txn record (SQLSTATE XX000)
Previous Message Michael Paquier 2019-09-06 07:54:15 Re: BUG #15977: Inconsistent behavior in chained transactions

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2019-09-06 17:54:13 Re: Avoiding hash join batch explosions with extreme skew and weird stats
Previous Message Joe Nelson 2019-09-06 17:24:12 Re: Change atoi to strtol in same place