Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-06-01 04:46:11
Message-ID: 20150601044611.GA23587@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Incomplete review, done in a relative rush:

On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote:
> OK, here's a patch. Actually two patches, differing only in
> whitespace, for 9.3 and for master (ha!). I now think that the root
> of the problem here is that DetermineSafeOldestOffset() and
> SetMultiXactIdLimit() were largely ignorant of the possibility that
> they might be called at points in time when the cluster was
> inconsistent.

A cause perhaps closer to the root is commit f741300 moving truncation from
VACUUM to checkpoints. CLOG has given us deep experience with VACUUM-time
truncation. Commit f6a6c46d and this patch are about bringing CHECKPOINT-time
truncation up to the same level.

Speaking of commit f6a6c46d, it seems logical that updating allocation stop
limits should happen proximate to truncation. That's currently the case for
CLOG (vac_truncate_clog() does both) and pg_multixact/members (checkpoint's
TruncateMultiXact() call does both). However, pg_multixact/offsets is
truncated from TruncateMultiXact(), but vac_truncate_clog() updates its limit.
I did not distill an errant test case, but this is fishy.

> SetMultiXactIdLimit() bracketed certain parts of its
> logic with if (!InRecovery), but those guards were ineffective because
> it gets called before InRecovery is set in the first place.

SetTransactionIdLimit() checks InRecovery for the same things, and it is
called at nearly the same moments as SetMultiXactIdLimit(). Do you have a
sense of whether it is subject to similar problems as a result?

> 1. Moves the call to DetermineSafeOldestOffset() that appears in
> StartupMultiXact() to TrimMultiXact(), so that we don't try to do this
> until we're consistent. Also, instead of passing
> MultiXactState->oldestMultiXactId, pass the newer of that value and
> the earliest offset that exists on disk. That way, it won't try to
> read data that's not there.

Perhaps emit a LOG message when we do that, since it's our last opportunity to
point to the potential data loss?

> + * PostgreSQL 9.3.0 through 9.3.6 and PostgreSQL 9.4.0 through 9.4.1
> + * had bugs that could allow users who reached those release through

s/release/releases/

> @@ -2859,6 +2947,14 @@ TruncateMultiXact(void)
> SimpleLruTruncate(MultiXactOffsetCtl,
> MultiXactIdToOffsetPage(oldestMXact));
>
> + /* Update oldest-on-disk value in shared memory. */
> + earliest = range.rangeStart * MULTIXACT_OFFSETS_PER_PAGE;
> + if (earliest < FirstMultiXactId)
> + earliest = FirstMultiXactId;
> + LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
> + Assert(MultiXactState->oldestMultiXactOnDiskValid);
> + MultiXactState->oldestMultiXactOnDiskValid = earliest;

That last line needs s/Valid//, I presume. Is it okay that
oldestMultiXactOnDisk becomes too-old during TruncateMultiXact(), despite its
Valid indicator remaining true?

> +static MultiXactOffset
> +GetOldestMultiXactOnDisk(void)
> +{

> + SlruScanDirectory(MultiXactOffsetCtl, SlruScanDirCbFindEarliest, &trunc);
> + earliest = trunc.earliestExistingPage * MULTIXACT_OFFSETS_PER_PAGE;
> + if (earliest < FirstMultiXactId)
> + earliest = FirstMultiXactId;

SlruScanDirCbFindEarliest() is only meaningful if the files on disk do not
represent a wrapped state. When the files do represent a wrapped state,
MultiXactIdPrecedes() is not transitive, and the SlruScanDirCbFindEarliest()
result is sensitive to readdir() order. This code exists specifically to help
us make do in the face of wrong catalog and pg_control entries. We may have
wrapped as a result of those of same catalog/pg_control entries, so I think
this function ought to account for wrap. I haven't given enough thought to
what exactly it should do.

Thanks,
nm

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Noah Misch 2015-06-01 04:55:34 Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Previous Message Michael Paquier 2015-06-01 04:45:13 Re: JSONB matching element count

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2015-06-01 04:55:34 Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Previous Message Abhijit Menon-Sen 2015-06-01 03:50:59 Re: pg_xlog -> pg_xjournal?