Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jacob Champion <pchampion(at)pivotal(dot)io>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Date: 2017-09-04 06:14:00
Message-ID: CAB7nPqQq7FHyfAM3dARint5vaPnb5ZXo=OvHexnWivRC5m1cjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 1, 2017 at 3:53 AM, Jacob Champion <pchampion(at)pivotal(dot)io> wrote:
> The patch is really two pieces: add the assertion, and fix the callers
> that would trip it. The latter part is still in progress, because I'm
> running into some places where I'm not sure what the correct way
> forward is.

I looked at your patch.

> While working on checksum support for GPDB, we noticed that several
> callers of PageGetLSN() didn't follow the correct locking procedure.

Well, PageGetLSN can be used in some hot code paths, xloginsert.c
being one, so it does not seem wise to me to switch it to something
more complicated than a macro, and also it is not bounded to any
locking contracts now. For example, pageinspect works on a copy of the
buffer, so that's one case where we just don't care. So we should it
be tightened in Postgres? That's the portion of the proposal I don't
get. You could introduce a custom API for this purpose, isn't the
origin of your problems with GPDB for checksums that data is
replicated across many nodes so things need to be locked uniquely?

> - This change introduces a subtle source incompatibility: whereas
> previously both Pages and PageHeaders could be passed to PageGetLSN(),
> now callers must explicitly cast to Page if they don't already have
> one.

That would produce warnings for extensions, so people would likely
catch that when compiling with a newer version, if they use -Werror...

> - Is my use of FRONTEND here correct? The documentation made it seem
> like some compilers could try to link against the
> AssertPageIsLockedForLSN function, even if PageGetLSN was never
> called.

+#if defined(USE_ASSERT_CHECKING) && !defined(FRONTEND)
+void
+AssertPageIsLockedForLSN(Page page)
+{
From a design point of view, bufmgr.c is an API interface for buffer
management on the backend-size, so just using FRONTEND there looks
wrong to me (bufmgr.h is already bad enough IMO now).

> - Use of PageGetLSN() in heapam.c seems to be incorrect, but I don't
> really know what the best way is to fix it. It explicitly unlocks the
> buffer, with the comment that visibilitymap_set() needs to handle
> locking itself, but then calls PageGetLSN() to determine whether
> visibilitymap_set() needs to be called.

This bit in src/backend/access/transam/README may interest you:
Note that we must only use PageSetLSN/PageGetLSN() when we know the action
is serialised. Only Startup process may modify data blocks during recovery,
so Startup process may execute PageGetLSN() without fear of serialisation
problems. All other processes must only call PageSet/GetLSN when holding
either an exclusive buffer lock or a shared lock plus buffer header lock,
or be writing the data block directly rather than through shared buffers
while holding AccessExclusiveLock on the relation.

So I see no problem with the exiting caller.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-09-04 06:14:39 Re: POC: Sharing record typmods between backends
Previous Message Tatsuro Yamada 2017-09-04 05:53:30 Re: CLUSTER command progress monitor