Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-06-16 21:41:32
Message-ID: CAEepm=2mNtgnVGbOFqXVJ8UdOGG7BmuE1HWWkz_GjQAFuSg52Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Jun 17, 2015 at 6:58 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Thomas Munro wrote:
>
>> Thanks. As mentioned elsewhere in the thread, I discovered that the
>> same problem exists for page boundaries, with a different error
>> message. I've tried the attached repro scripts on 9.3.0, 9.3.5, 9.4.1
>> and master with the same results:
>>
>> FATAL: could not access status of transaction 2048
>> DETAIL: Could not read from file "pg_multixact/offsets/0000" at
>> offset 8192: Undefined error: 0.
>>
>> FATAL: could not access status of transaction 131072
>> DETAIL: Could not open file "pg_multixact/offsets/0002": No such file
>> or directory.
>
> So I checked this bug against current master, because it's claimed to be
> closed. The first script doesn't emit a message at all; the second
> script does emit a message:
>
> LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound
>
> If you start and stop again, there's no more noise in the logs. That's
> pretty innocuous -- great.

Right, I included a fix for this in
https://commitfest.postgresql.org/5/265/ which handles both
pg_subtrans and pg_multixact, since it was lost in the noise in this
thread... Hopefully someone can review that.

> But then I modified your script to do two segments instead of one. Then
> after the second cycle is done, start the server and stop it again. The
> end result is a bit surprising: you end up with no files in
> pg_multixact/offsets at all!

Ouch. I see why: latest_page_number gets initialised to a different
value when you restart (computed from oldest multixact ID, whereas
during normal running it remembers the last created page number), so
in this case (next == oldest, next % 2048 == 0), restarting the server
moves latest_page_number forwards by one, so SimpleLruTruncate no
longer bails out with the above error message and it happily deletes
all files. That is conceptually OK (there are no multixacts, so no
files should be OK), but see below... Applying the page linked above
prevents this problem (it always keeps at least one multixact and
therefore at least one page and therefore at least one segment,
because it steps back one multixact to avoid boundary problems when
oldest == next).

As for whether it's actually OK to have no files in
pg_multixact/offsets, it seems that if you restart *twice* after
running checkpoint-segment-boundary.sh, you finish up with earliest =
4294965248 in TruncateMultiXact, because this code assumes that there
was at least one file found and then proceeds to assign (-1 * 2048) to
earliest (which is unsigned).

trunc.earliestExistingPage = -1;
SlruScanDirectory(MultiXactOffsetCtl,
SlruScanDirCbFindEarliest, &trunc);
earliest = trunc.earliestExistingPage * MULTIXACT_OFFSETS_PER_PAGE;
if (earliest < FirstMultiXactId)
earliest = FirstMultiXactId;

I think this should bail out if earliestExistingPage is still -1 after
the call to SlruScanDirectory.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Lacey Powers 2015-06-16 22:12:36 Re: [GENERAL] pg_xlog on a hot_standby slave filling up
Previous Message Guillaume Lelarge 2015-06-16 20:28:32 Re: pg_xlog on a hot_stanby slave

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2015-06-17 02:35:14 Re: does tuple store subtransaction id in it?
Previous Message Alvaro Herrera 2015-06-16 19:17:01 Re: 9.5 release notes