Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()

From: Jacob Champion <pchampion(at)pivotal(dot)io>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Date: 2017-09-05 16:15:02
Message-ID: CABAq_6GvDL=V52kK9h2JujY_xmb3iKDHq7CUN7WrwbU4mYWD9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael, thanks for the review!

On Sun, Sep 3, 2017 at 11:14 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>> While working on checksum support for GPDB, we noticed that several
>> callers of PageGetLSN() didn't follow the correct locking procedure.
>
> Well, PageGetLSN can be used in some hot code paths, xloginsert.c
> being one, so it does not seem wise to me to switch it to something
> more complicated than a macro,

If raw perf is an issue -- now that I've refactored the assertion into
its own function, I could probably push the implementation back into a
macro. Something like

#define PageGetLSN(page) (AssertPageIsLockedForLSN((Page) page),
PageXLogRecPtrGet(((PageHeader) page)->pd_lsn))

Callers would need to watch out for the double-evaluation, but that's
already documented with this group of macros.

> and also it is not bounded to any
> locking contracts now. For example, pageinspect works on a copy of the
> buffer, so that's one case where we just don't care. So we should it
> be tightened in Postgres?

The proposed patch doesn't tighten the contract in this case -- if the
buffer that's passed to PageGetLSN() isn't a shared buffer (i.e. if
it's not part of the BufferBlocks array), the assertion passes
unconditionally.

> That's the portion of the proposal I don't
> get. You could introduce a custom API for this purpose, isn't the
> origin of your problems with GPDB for checksums that data is
> replicated across many nodes so things need to be locked uniquely?

GPDB introduces additional problems, but I think this issue is
independent of anything GPDB-specific. One way to look at it: are the
changes in the proposed patch, from PageGetLSN to BufferGetLSNAtomic,
correct? If not, well, that's a good discussion to have. But if they
are correct, then why were they missed? and is there a way to help
future contributors out in the future? That's the goal of this patch;
I don't think it's something that a custom API solves.

>> - This change introduces a subtle source incompatibility: whereas
>> previously both Pages and PageHeaders could be passed to PageGetLSN(),
>> now callers must explicitly cast to Page if they don't already have
>> one.
>
> That would produce warnings for extensions, so people would likely
> catch that when compiling with a newer version, if they use -Werror...
>
>> - Is my use of FRONTEND here correct? The documentation made it seem
>> like some compilers could try to link against the
>> AssertPageIsLockedForLSN function, even if PageGetLSN was never
>> called.
>
> +#if defined(USE_ASSERT_CHECKING) && !defined(FRONTEND)
> +void
> +AssertPageIsLockedForLSN(Page page)
> +{
> From a design point of view, bufmgr.c is an API interface for buffer
> management on the backend-size, so just using FRONTEND there looks
> wrong to me (bufmgr.h is already bad enough IMO now).

The comments suggested that bufmgr.h could be incidentally pulled into
frontend code through other headers. Or do you mean that I don't need
to check for FRONTEND in the C source file (since, I assume, it's only
ever being compiled for the backend)? I'm not sure what you mean by
"bufmgr.h is already bad enough".

>> - Use of PageGetLSN() in heapam.c seems to be incorrect, but I don't
>> really know what the best way is to fix it. It explicitly unlocks the
>> buffer, with the comment that visibilitymap_set() needs to handle
>> locking itself, but then calls PageGetLSN() to determine whether
>> visibilitymap_set() needs to be called.
>
> This bit in src/backend/access/transam/README may interest you:
> Note that we must only use PageSetLSN/PageGetLSN() when we know the action
> is serialised. Only Startup process may modify data blocks during recovery,
> so Startup process may execute PageGetLSN() without fear of serialisation
> problems. All other processes must only call PageSet/GetLSN when holding
> either an exclusive buffer lock or a shared lock plus buffer header lock,
> or be writing the data block directly rather than through shared buffers
> while holding AccessExclusiveLock on the relation.
>
> So I see no problem with the exiting caller.

Is heap_xlog_visible only called during startup? If so, is there a
general rule for which locks are required during startup and which
aren't?

Currently there is no exception in the assertion made for startup
processes, so that's something that would need to be changed. Is there
a way to determine whether the current process considers itself to be
a startup process?

Thanks again!
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-05 16:16:12 Re: [COMMITTERS] pgsql: Add psql variables showing server version and psql version.
Previous Message Pavel Stehule 2017-09-05 16:12:48 Re: proposal: psql: check env variable PSQL_PAGER