Re: refactoring relation extension and BufferAlloc(), faster COPY

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Date: 2023-02-11 21:36:51
Message-ID: 20230211213651.pu4ns2764fxpyit5@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-02-11 23:03:56 +0200, Heikki Linnakangas wrote:
> > v2-0005-bufmgr-Acquire-and-clean-victim-buffer-separately.patch
> This can be applied separately from the rest of the patches, which is nice.
> Some small comments on it:

Thanks for looking at these!

> * Needs a rebase, it conflicted slightly with commit f30d62c2fc.

Will work on that.

> * GetVictimBuffer needs a comment to explain what it does. In particular,
> mention that it returns a buffer that's pinned and known !BM_TAG_VALID.

Will add.

> * I suggest renaming 'cur_buf' and other such local variables in
> GetVictimBufffer to just 'buf'. 'cur' prefix suggests that there is some
> other buffer involved too, but there is no 'prev' or 'next' or 'other'
> buffer. The old code called it just 'buf' too, and before this patch it
> actually was a bit confusing because there were two buffers involved. But
> with this patch, GetVictimBuffer only deals with one buffer at a time.

Hm. Yea. I probably ended up with these names because initially
GetVictimBuffer() wasn't a separate function, and I indeed constantly got
confused by which buffer was referenced.

> * This FIXME:
>
> > /* OK, do the I/O */
> > /* FIXME: These used the wrong smgr before afaict? */
> > {
> > SMgrRelation smgr = smgropen(BufTagGetRelFileLocator(&buf_hdr->tag),
> > InvalidBackendId);
> >
> > TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(buf_hdr->tag.forkNum,
> > buf_hdr->tag.blockNum,
> > smgr->smgr_rlocator.locator.spcOid,
> > smgr->smgr_rlocator.locator.dbOid,
> > smgr->smgr_rlocator.locator.relNumber);
> >
> > FlushBuffer(buf_hdr, smgr, IOOBJECT_RELATION, io_context);
> > LWLockRelease(content_lock);
> >
> > ScheduleBufferTagForWriteback(&BackendWritebackContext,
> > &buf_hdr->tag);
> >
> > TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(buf_hdr->tag.forkNum,
> > buf_hdr->tag.blockNum,
> > smgr->smgr_rlocator.locator.spcOid,
> > smgr->smgr_rlocator.locator.dbOid,
> > smgr->smgr_rlocator.locator.relNumber);
> > }
>
> I believe that was intentional. The probes previously reported the block and
> relation whose read *caused* the eviction. It was not just the smgr but also
> the blockNum and forkNum that referred to the block that was being read.

You're probably right. It's certainly not understandable from our docs
though:

<row>
<entry><literal>buffer-write-dirty-start</literal></entry>
<entry><literal>(ForkNumber, BlockNumber, Oid, Oid, Oid)</literal></entry>
<entry>Probe that fires when a server process begins to write a dirty
buffer. (If this happens often, it implies that
<xref linkend="guc-shared-buffers"/> is too
small or the background writer control parameters need adjustment.)
arg0 and arg1 contain the fork and block numbers of the page.
arg2, arg3, and arg4 contain the tablespace, database, and relation OIDs
identifying the relation.</entry>
</row>

> I see that reporting the evicted page is more convenient with this patch,
> otherwise you'd need to pass the smgr and blocknum of the page that's being
> read to InvalidateVictimBuffer(). IMHO you can just remove these probe
> points. We don't need to bend over backwards to maintain specific probe
> points.

Agreed.

> * InvalidateVictimBuffer reads the buffer header with an atomic read op,
> just to check if BM_TAG_VALID is set.

It's not a real atomic op, in the sense of being special instruction. It does
force the compiler to actually read from memory, but that's it.

But you're right, even that is unnecessary. I think it ended up that way
because I also wanted the full buf_hdr, and it seemed somewhat error prone to
pass in both.

> * I don't understand this comment:
>
> > /*
> > * Clear out the buffer's tag and flags and usagecount. We must do
> > * this to ensure that linear scans of the buffer array don't think
> > * the buffer is valid.
> > *
> > * XXX: This is a pre-existing comment I just moved, but isn't it
> > * entirely bogus with regard to the tag? We can't do anything with
> > * the buffer without taking BM_VALID / BM_TAG_VALID into
> > * account. Likely doesn't matter because we're already dirtying the
> > * cacheline, but still.
> > *
> > */
> > ClearBufferTag(&buf_hdr->tag);
> > buf_state &= ~(BUF_FLAG_MASK | BUF_USAGECOUNT_MASK);
> > UnlockBufHdr(buf_hdr, buf_state);
>
> What exactly is wrong with clearing the tag? What does dirtying the
> cacheline have to do with the correctness here?

There's nothing wrong with clearing out the tag, but I don't think it's a hard
requirement today, and certainly not for the reason stated above.

Validity of the buffer isn't determined by the tag, it's determined by
BM_VALID (or, if you interpret valid more widely, BM_TAG_VALID).

Without either having pinned the buffer, or holding the buffer header
spinlock, the tag can change at any time. And code like DropDatabaseBuffers()
knows that, and re-checks the the tag after locking the buffer header
spinlock.

Afaict, there'd be no correctness issue with removing the
ClearBufferTag(). There would be an efficiency issue though, because when
encountering an invalid buffer, we'd unnecessarily enter InvalidateBuffer(),
which'd find that BM_[TAG_]VALID isn't set, and not to anything.

Even though it's not a correctness issue, it seems to me that
DropRelationsAllBuffers() etc ought to check if the buffer is BM_TAG_VALID,
before doing anything further. Particularly in DropRelationsAllBuffers(), the
check we do for each buffer isn't cheap. Doing it for buffers that don't even
have a tag seems .. not smart.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2023-02-11 21:51:21 Re: Improving inferred query column names
Previous Message Andres Freund 2023-02-11 21:05:03 Re: proposal: psql: psql variable BACKEND_PID