Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Serge Negodyuck <petr(at)petrovich(dot)kiev(dot)ua>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date: 2014-01-03 01:46:17
Message-ID: 20140103014616.GB7035@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Andres Freund wrote:

> On 2013-12-09 17:49:34 +0200, Serge Negodyuck wrote:
> > On master there are files from 0000 to 14078
> >
> > On slave there were absent files from A1xx to FFFF
> > They were the oldest ones. (October, November)
>
> Some analysis later, I am pretty sure that the origin is a longstanding
> problem and not connected to 9.3.[01] vs 9.3.2.
>
> The above referenced 14078 file is exactly the last page before a
> members wraparound:
> (gdb) p/x (1L<<32)/(MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)
> $10 = 0x14078
>
> So, what happened is that enough multixacts where created, that the
> members slru wrapped around. It's not unreasonable for the members slru
> to wrap around faster then the offsets one - after all we create at
> least two entries into members for every offset entry. Also in 9.3+
> there fit more xids on a offset than a members page.
> When truncating, we first read the offset, to know where we currently
> are in members, and then truncate both from their respective
> point. Since we've wrapped around in members we very well might remove
> content we actually need.
>
> I've recently remarked that I find it dangerous that we only do
> anti-wraparound stuff for pg_multixact/offsets, not for /members. So,
> here we have the proof that that's bad.

I have applied three patches to deal with some of the problems reported
here, and others discovered during the investigation of them. One of
them was about failing to truncate files beyond FFFF. That should work
fine now -- that is, you would lose more data. Unless we consider a
second fix, which is that files corresponding to data still in use are
no longer truncated.

I had to include the third fix (to enable the system to wrap around
sanely from file 14078 to 0000) was necessary so I could reproduce the
issues. In systems with assertions enabled, there is a crash at the
point of overflow. I didn't try, but since your system appears to have
wrapped around I imagine it sort-of works in systems compiled without
assertions (which is the recommended setting for production settings.)

One thing not yet patched is overrun of members' SLRU: if you have
enough live multixacts with enough members, creating a new one might
overwrite the members area used by an older member. Freezing multis
earlier would help with that. With the default settings, where multis
are frozen when they are 50 million multis old and pages are 8kB long,
there is room for 85 members per multi on average without such
overrun[*]. I was able to observe this overrun by running Andres'
pg_multixact_burn with each multixact having 100 members. I doubt it's
common to have that many members in each multixact on average, but it's
certainly a possibility.

[*] There are 82040 files, having 32 pages each; each page has room for
1636 members. (82040 * 32 / 1636) / 50000000 =~ 85.

One complaint somebody might rightly have about this is the space
consumption by pg_multixact/ files. Perhaps instead of using Xid's
freezing horizon verbatim, we should use ceil(min_freeze_age^0.8) or
something like that; so for the default 50000000 Xid freezing age, we
would freeze multis over 1.44 million multis old. (We could determine
ratio of xid to multi usage so that they would both freeze when the same
time has lapsed, but this seems unnecessarily complicated.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message richcocoa 2014-01-03 12:33:11 BUG #8718: serial datatype creates a sequence with bigserial limits
Previous Message digoal 2014-01-01 08:22:34 Re: BUG #8710: dblink dblink_get_pkey output bug, and dblink_build_sql_update bug

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2014-01-03 01:54:31 Re: [bug fix] connection service file doesn't take effect with ECPG apps
Previous Message Wim Lewis 2014-01-03 01:30:25 Re: [PATCH] Make various variables read-only (const)