Re: Reviewing freeze map code

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-06-20 19:33:02
Message-ID: 20160620193302.s4dbul6japgar5eg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-06-15 08:56:52 -0400, Robert Haas wrote:
> Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2,
> and CTID without logging anything or clearing the all-visible flag and
> then releases the lock on the heap page to go do some more work that
> might even ERROR out. Only if that other work all goes OK do we
> relock the page and perform the WAL-logged actions.
>
> That doesn't seem like a good idea even in existing releases, because
> you've taken a tuple on an all-visible page and made it not
> all-visible, and you've made a page modification that is not
> necessarily atomic without logging it.

Right, that's broken.

> I'm not sure what to do about this: this part of the heap_update()
> logic has been like this forever, and I assume that if it were easy to
> refactor this away, somebody would have done it by now.

Well, I think generally nobody seriously looked at actually refactoring
heap_update(), even though that'd be a good idea. But in this instance,
the problem seems relatively fundamental:

We need to lock the origin page, to do visibility checks, etc. Then we
need to figure out the target page. Even disregarding toasting - which
we could be doing earlier with some refactoring - we're going to have to
release the page level lock, to lock them in ascending order. Otherwise
we'll risk kinda likely deadlocks. We also certainly don't want to nest
the lwlocks for the toast stuff, inside a content lock for the old
tupe's page.

So far the best idea I have - and it's really not a good one - is to
invent a new hint-bit that tells concurrent updates to acquire a
heavyweight tuple lock, while releasing the page-level lock. If that
hint bit does not require any other modifications - and we don't need an
xid in xmax for this use case - that'll avoid doing all the other
`already_marked` stuff early, which should address the correctness
issue. It's kinda invasive though, and probably has performance
implications.

Does anybody have a better idea?

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-06-20 19:35:38 Re: 10.0
Previous Message David G. Johnston 2016-06-20 19:29:12 Re: parallel.c is not marked as test covered