Re: Incomplete freezing when truncating a relation during vacuum

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incomplete freezing when truncating a relation during vacuum
Date: 2013-11-30 16:00:58
Message-ID: 20131130160058.GB31100@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Noah,

On 2013-11-30 00:40:06 -0500, Noah Misch wrote:
> > > On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
> > > > With regard to fixing things up, ISTM the best bet is heap_prune_chain()
> > > > so far. That's executed b vacuum and by opportunistic pruning and we
> > > > know we have the appropriate locks there. Looks relatively easy to fix
> > > > up things there. Not sure if there are any possible routes to WAL log
> > > > this but using log_newpage()?
> > > > I am really not sure what the best course of action is :(
>
> Based on subsequent thread discussion, the plan you outline sounds reasonable.
> Here is a sketch of the specific semantics of that fixup. If a HEAPTUPLE_LIVE
> tuple has XIDs older than the current relfrozenxid/relminmxid of its relation
> or newer than ReadNewTransactionId()/ReadNextMultiXactId(), freeze those XIDs.
> Do likewise for HEAPTUPLE_DELETE_IN_PROGRESS, ensuring a proper xmin if the
> in-progress deleter aborts. Using log_newpage_buffer() seems fine; there's no
> need to optimize performance there.

We'd need to decide what to do with xmax values, they'd likely need to
be treated differently.

The problem with log_newpage_buffer() is that we'd quite possibly issue
one such call per item on a page. And that might become quite
expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
scary.

> (All the better if we can, with minimal
> hacks, convince heap_freeze_tuple() itself to log the right changes.)

That likely comes to late - we've already pruned the page and might have
made wrong decisions there. Also, heap_freeze_tuple() is run on both the
primary and standbys.
I think our xl_heap_freeze format, that relies on running
heap_freeze_tuple() during recovery, is a terrible idea, but we cant
change that right now.

> Time is tight to finalize this, but it would be best to get this into next
> week's release. That way, the announcement, fix, and mitigating code
> pertaining to this data loss bug all land in the same release. If necessary,
> I think it would be worth delaying the release, or issuing a new release a
> week or two later, to closely align those events. That being said, I'm
> prepared to review a patch in this area over the weekend.

I don't think I currently have the energy/brainpower/time to develop
such a fix in a suitable quality till monday. I've worked pretty hard on
trying to fix the host of multixact data corruption bugs the last days
and developing a solution that I'd be happy to put into such critical
paths is certainly several days worth of work.

I am not sure if it's a good idea to delay the release because of this,
there are so many other critical issues that that seems like a bad
tradeoff.

That said, if somebody else is taking the lead I am certainly willing to
help in detail with review and testing.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-30 16:06:31 Re: MultiXact truncation, startup et al.
Previous Message Peter Eisentraut 2013-11-30 15:58:26 Re: MultiXact truncation, startup et al.