Re: MultiXactId error after upgrade to 9.3.4

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MultiXactId error after upgrade to 9.3.4
Date: 2016-06-15 23:34:27
Message-ID: 20160615233427.GA18976@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost wrote:
> Greetings,
>
> Looks like we might not be entirely out of the woods yet regarding
> MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the
> following:
>
> ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound
>
> The table contents can be select'd out and match the pre-upgrade
> backup, but any attempt to VACUUM / VACUUM FULL / CLUSTER fails,
> unsurprisingly.

I finally figured what is going on here, though I don't yet have a
patch.

This has been reported a number of times:

https://www.postgresql.org/message-id/20140330040029.GY4582%40tamriel.snowman.net
https://www.postgresql.org/message-id/538F3D70.6080902%40publicrelay.com
https://www.postgresql.org/message-id/556439CF.7070109%40pscs.co.uk
https://www.postgresql.org/message-id/20160614173150.GA443784@alvherre.pgsql
https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org

We theorised that we were missing some place that was failing to pass
the "allow_old" flag to GetMultiXactIdMembers; and since we couldn't
find any and the problem was worked around simply (by doing SELECT FOR
UPDATE or equivalent on the affected tuples), there was no further
research. (The allow_old flag is passed for tuples that match an
infomask bit pattern that can only come from tuples locked in 9.2 and
prior, i.e. one that is never set by 9.3ff).

Yesterday I had to deal with it and quickly found what is going wrong:
the problem is that 9.2 and earlier it was acceptable (and common) to
leave tuples with very old multixacts in xmax, even after multixact
counter wraparound. When one such value was found in a live tuple,
GetMultiXactIdMembers() would notice that it was out of range and simply
return "no members", at which point heap_update and siblings would
consider the tuple as not locked and move on.

When pg_upgrading a database containing tuples marked like that, the new
code would error out, because during 9.3 multixact we considered that it
was dangerous to silently allow tuples to be marked by values we didn't
keep track of, so we made it an error instead, per
https://www.postgresql.org/message-id/20111204122027.GA10035%40tornado.leadboat.com
Some cases are allowed to be downgraded to DEBUG, when allow_old is
true.

I think that was a good choice in general so that possibly-data-eating
bugs could be reported, but there's a problem in the specific case of
tuples carried over by pg_upgrade whose Multixact is "further in the
future" compared to the nextMultiXactId counter. I think it's pretty
clear that we should let that error be downgraded to DEBUG too, like the
other checks.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-06-15 23:37:25 Re: increase message string buffer size of watch command of psql
Previous Message Robbie Harwood 2016-06-15 23:16:56 Re: [PATCH v12] GSSAPI encryption support