Re: ReadRecentBuffer() doesn't scale well

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ReadRecentBuffer() doesn't scale well
Date: 2023-06-29 15:39:48
Message-ID: 20230629153948.rk7re27qz3qresdt@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-06-29 19:35:30 +1200, Thomas Munro wrote:
> I (re)discovered why I used the lock-then-pin approach. In the
> comments I mentioned InvalidBuffer(), but the main problem is in its
> caller GetVictimBuffer() which has various sanity checks about
> reference counts that can occasionally fail if you have code randomly
> pinning any old buffer.

You're right. Specifically non-valid buffers are the issue.

> New idea: use the standard PinBuffer() function, but add a mode that
> doesn't pin invalid buffers (with caveat that you can perhaps get a
> false negative due to unlocked read, but never a false positive; see
> commit message). Otherwise we'd have to duplicate all the same logic
> to use cmpxchg for ReadRecentBuffer(), or rethink the assumptions in
> that other code.

It might be worth using lock free code in more places before long, but I agree
with the solution here.

> As for the lack of usage bump in the back-branches, I think the
> options are: teach PinBuffer_Locked() to increment it optionally, or
> back-patch whatever we come up with for this.

Hm, or just leave it as is.

> For the root buffer optimisation, the obvious place for storage seems
> to be under rd_amcache. It was originally invented for the cached
> metapage (commit d2896a9ed14) but could accommodate a new struct
> holding whatever we want. Here is a patch to try that out.
> BTAMCacheData would also be a natural place to put future things
> including a copy of the root page itself, in later work on lock-free
> tricks.

I am wondering if we don't want something more generic than stashing this in
rd_amcache. Don't want to end up duplicating relevant code across the uses of
rd_amcache in every AM.

> @@ -663,38 +663,17 @@ ReadRecentBuffer(RelFileLocator rlocator, ForkNumber forkNum, BlockNumber blockN
> else
> {
> bufHdr = GetBufferDescriptor(recent_buffer - 1);
> - have_private_ref = GetPrivateRefCount(recent_buffer) > 0;
>
> - /*
> - * Do we already have this buffer pinned with a private reference? If
> - * so, it must be valid and it is safe to check the tag without
> - * locking. If not, we have to lock the header first and then check.
> - */
> - if (have_private_ref)
> - buf_state = pg_atomic_read_u32(&bufHdr->state);
> - else
> - buf_state = LockBufHdr(bufHdr);
> -
> - if ((buf_state & BM_VALID) && BufferTagsEqual(&tag, &bufHdr->tag))
> + /* Is it still valid and holding the right tag? */
> + if (PinBuffer(bufHdr, NULL, true))

I do wonder if we should have an unlocked pre-check for a) the buffer being
valid and b) BufferTagsEqual() matching. With such a pre-check the race for
increasing the usage count of the wrong buffer is quite small, without the
pre-check it doesn't seem that small anymore.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2023-06-29 16:05:59 Re: [PATCH] Honor PG_TEST_NOCLEAN for tempdirs
Previous Message Robert Haas 2023-06-29 15:19:38 Re: pgsql: Fix search_path to a safe value during maintenance operations.