Re: refactoring relation extension and BufferAlloc(), faster COPY

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Date: 2023-02-21 15:16:33
Message-ID: 8185d3a4-5123-10ca-db33-30b4a03fff7e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/02/2023 23:36, Andres Freund wrote:
> On 2023-02-11 23:03:56 +0200, Heikki Linnakangas wrote:
>> * I don't understand this comment:
>>
>>> /*
>>> * Clear out the buffer's tag and flags and usagecount. We must do
>>> * this to ensure that linear scans of the buffer array don't think
>>> * the buffer is valid.
>>> *
>>> * XXX: This is a pre-existing comment I just moved, but isn't it
>>> * entirely bogus with regard to the tag? We can't do anything with
>>> * the buffer without taking BM_VALID / BM_TAG_VALID into
>>> * account. Likely doesn't matter because we're already dirtying the
>>> * cacheline, but still.
>>> *
>>> */
>>> ClearBufferTag(&buf_hdr->tag);
>>> buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK);
>>> UnlockBufHdr(buf_hdr, buf_state);
>>
>> What exactly is wrong with clearing the tag? What does dirtying the
>> cacheline have to do with the correctness here?
>
> There's nothing wrong with clearing out the tag, but I don't think it's a hard
> requirement today, and certainly not for the reason stated above.
>
> Validity of the buffer isn't determined by the tag, it's determined by
> BM_VALID (or, if you interpret valid more widely, BM_TAG_VALID).
>
> Without either having pinned the buffer, or holding the buffer header
> spinlock, the tag can change at any time. And code like DropDatabaseBuffers()
> knows that, and re-checks the the tag after locking the buffer header
> spinlock.
>
> Afaict, there'd be no correctness issue with removing the
> ClearBufferTag(). There would be an efficiency issue though, because when
> encountering an invalid buffer, we'd unnecessarily enter InvalidateBuffer(),
> which'd find that BM_[TAG_]VALID isn't set, and not to anything.

Okay, gotcha.

> Even though it's not a correctness issue, it seems to me that
> DropRelationsAllBuffers() etc ought to check if the buffer is BM_TAG_VALID,
> before doing anything further. Particularly in DropRelationsAllBuffers(), the
> check we do for each buffer isn't cheap. Doing it for buffers that don't even
> have a tag seems .. not smart.

Depends on what percentage of buffers are valid, I guess. If all buffers
are valid, checking BM_TAG_VALID first would lose. In practice, I doubt
it makes any measurable difference either way.

Since we're micro-optimizing, I noticed that
BufTagMatchesRelFileLocator() compares the fields in order "spcOid,
dbOid, relNumber". Before commit 82ac34db20, we used
RelFileLocatorEqual(), which has this comment:

/*
* Note: RelFileLocatorEquals and RelFileLocatorBackendEquals compare
relNumber
* first since that is most likely to be different in two unequal
* RelFileLocators. It is probably redundant to compare spcOid if the
other
* fields are found equal, but do it anyway to be sure. Likewise for
checking
* the backend ID in RelFileLocatorBackendEquals.
*/

So we lost that micro-optimization. Should we reorder the checks in
BufTagMatchesRelFileLocator()?

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2023-02-21 15:40:31 Re: refactoring relation extension and BufferAlloc(), faster COPY
Previous Message Heikki Linnakangas 2023-02-21 14:50:53 Missing llvm_leave_fatal_on_oom() call