Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()

From: Jacob Champion <pchampion(at)pivotal(dot)io>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Asim Praveen <apraveen(at)pivotal(dot)io>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: [PATCH] Assert that the correct locks are held when calling PageGetLSN()
Date: 2017-11-07 17:26:58
Message-ID: CABAq_6G3TGnBLFRrMXsO8Z_p8eF1atjTj0J4QVG+TQ9c0_S-Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

On Mon, Nov 6, 2017 at 6:18 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Did you really test WAL replay?

Is there a way to test this other than installcheck-world? The only
failure we've run into at the moment is in the snapshot-too-old tests.
Maybe we're not configuring with all the tests enabled?

>> The assertion added caught at least one code path where TestForOldSnapshot
>> calls PageGetLSN without holding any lock. The snapshot_too_old test in
>> "check-world" failed due to the assertion failure. This needs to be fixed,
>> see the open question in the opening mail on this thread.
>
> Good point. I am looping Kevin Grittner here for his input, as the
> author and committer of old_snapshot_threshold. Things can be
> addressed with a separate patch by roughly moving the check on
> PageGetLSN() to TestForOldSnapshot_impl() and use BufferGetLSNAtomic()
> instead.

It still doesn't pass the tests, as not all code paths ensure that a
content lock is held before calling TestForOldSnapshot.
BufferGetLSNAtomic only adds the spinlock.

> The commit fest has lost view of this entry, and so did I. So I have
> added a new entry:
> https://commitfest.postgresql.org/16/1371/

Thank you!

> BufferGetLSNAtomic() could really use LWLockHeldByMe(). Could you
> consider it with an extra patch on top of 0001?

The LWLockHeldByMe() calls are added to BufferGetLSNAtomic in patch
0002 (because it does all its work through PageGetLSN).

> It seems to me that 0001 is good for a committer lookup, that will get
> rid of all existing bugs. For 0002, what you are proposing is still
> not a good idea for anything using page copies.

I think there is still significant confusion here. To be clear: this
patch is intended to add no changes for local page copies. As I've
tried to say (three or four times now): to our understanding, local
page copies are not allocated in the shared BufferBlocks region and
are therefore not subject to the new assertions. Am I missing a corner
case, or completely misunderstanding your point? I never got a direct
response to my earlier questions on this.

--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-11-07 19:32:00 Re: [POC] Faster processing at Gather node
Previous Message Simon Riggs 2017-11-07 17:23:49 Re: Remove secondary checkpoint