Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Timothy Garnett <tgarnett(at)panjiva(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Date: 2015-05-07 23:58:11
Message-ID: D8A99261-9E17-436B-B08C-BDCDBB07239D@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On May 7, 2015, at 6:21 PM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> I may have the details wrong there, but in general I can't see how you
> can delete just the right segment files while the head and tail
> pointers of this circular buffer are both moving, unless you hold the
> lock to stop that while you make the decision and unlink the file
> (which I assume is out of the question), or say 'well if we keep the
> head and tail pointers at least 20 pages apart, that's unlikely to
> happen' (which I assume is also out of the question).

That is exactly the sort of thing I was worried about, but I couldn't put my finger on it. I think your analysis is right.

> Now I
> understand the suggestion that the checkpoint code could be in charge
> of advancing the oldest multixact + offset.

Yeah. I think we need to pursue that angle unless somebody has a better idea. It would also address another issue I am concerned about: the current system seems to advance the oldest multixact information in shared memory differently on the primary and the standby. I'm not sure the logic actually works on the standby at all at this point, but even if it does, it seems unlikely be right to rely on redo to do on the standby what is being done by a completely different, not-WAL-logged operation on the master. Making the checkpoint do it in both cases would fix that.

> But if we did that, our new autovacuum code would get confused and
> keep trying to vacuum stuff that it's already dealt with, until the
> next checkpoint... so maybe we'd need to track both (1) the offset of
> the oldest multixact: the one that we use to trigger autovacuums, even
> though there might be even older segments still on disk, and (2)
> offset_of_start_of_oldest_segment_on_disk (suitably compressed): the
> one that we use to block wraparound in GetNewMultiXactId, so that we
> never write into a file that checkpoint might decide to delete.

That sounds plausible.

...Robert

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2015-05-08 04:26:32 Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)
Previous Message Thomas Munro 2015-05-07 22:21:48 Re: Re: BUG #12990: Missing pg_multixact/members files (appears to have wrapped, then truncated)