RE: [Patch] Optimize dropping of relation buffers using dlist

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, 'Andres Freund' <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "tomas(dot)vondra(at)2ndquadrant(dot)com" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-11-26 03:04:10
Message-ID: OSBPR01MB2341A11815C459DCE53F7F38EFF90@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, November 19, 2020 4:08 PM, Tsunakawa, Takayuki wrote:
> From: Andres Freund <andres(at)anarazel(dot)de>
> > DropRelFileNodeBuffers() in recovery? The most common path is
> > DropRelationFiles()->smgrdounlinkall()->DropRelFileNodesAllBuffers(),
> > which 3/4 doesn't address and 4/4 doesn't mention.
> >
> > 4/4 seems to address DropRelationFiles(), but only talks about
> TRUNCATE?
>
> Yes. DropRelationFiles() is used in the following two paths:
>
> [Replay of TRUNCATE during recovery]
> xact_redo_commit/abort() -> DropRelationFiles() -> smgrdounlinkall() ->
> DropRelFileNodesAllBuffers()
>
> [COMMIT/ROLLBACK PREPARED]
> FinishPreparedTransaction() -> DropRelationFiles() -> smgrdounlinkall()
> -> DropRelFileNodesAllBuffers()

Yes. The concern is that it was not clear in the function descriptions and commit logs
what the optimizations for the DropRelFileNodeBuffers() and DropRelFileNodesAllBuffers()
are for. So I revised only the function description of DropRelFileNodeBuffers() and the
commit logs of the 0003-0004 patches. Please check if the brief descriptions would suffice.

> > I also don't get why 4/4 would be a good idea on its own. It uses
> > BUF_DROP_FULL_SCAN_THRESHOLD to guard
> > FindAndDropRelFileNodeBuffers() on a per relation basis. But since
> > DropRelFileNodesAllBuffers() can be used for many relations at once,
> > this could end up doing BUF_DROP_FULL_SCAN_THRESHOLD - 1
> lookups a lot
> > of times, once for each of nnodes relations?
>
> So, the threshold value should be compared with the total number of blocks
> of all target relations, not each relation. You seem to be right, got it.

Fixed this in 0004 patch. Now we compare the total number of buffers-to-be-invalidated
For ALL relations to the BUF_DROP_FULL_SCAN_THRESHOLD.

> > Also, how is 4/4 safe - this is outside of recovery too?
>
> It seems that DropRelFileNodesAllBuffers() should trigger the new
> optimization path only when InRecovery == true, because it intentionally
> doesn't check the "accurate" value returned from smgrnblocks().

Fixed it in 0004 patch. Now we ensure that we only enter the optimization path
Iff during recovery.

> From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> On Wed, Nov 18, 2020 at 11:43 PM Andres Freund <andres(at)anarazel(dot)de>
> > I'm also worried about the cases where this could cause buffers left
> > in the buffer pool, without a crosscheck like Thomas' patch would
> > allow to add. Obviously other processes can dirty buffers in
> > hot_standby, so any leftover buffer could have bad consequences.
> >
>
> The problem can only arise if other processes extend the relation. The idea
> was that in recovery it extends relation by one process which helps to
> maintain the cache. Kirk seems to have done testing to cross-verify it by using
> his first patch (Prevent-invalidating-blocks-in-smgrextend-during). Which
> other crosscheck you are referring here?
>
> I agree that we can do a better job by expanding comments to clearly state
> why it is safe.

Yes, basically what Amit-san also mentioned above. The first patch prevents that.
And in the description of DropRelFileNodeBuffers in the 0003 patch, please check
If that would suffice.

> > Smaller comment:
> >
> > +static void
> > +FindAndDropRelFileNodeBuffers(RelFileNode rnode, ForkNumber
> *forkNum,
> > int nforks,
> > + BlockNumber
> > *nForkBlocks, BlockNumber *firstDelBlock) ...
> > + /* Check that it is in the buffer pool. If not, do
> nothing.
> > */
> > + LWLockAcquire(bufPartitionLock, LW_SHARED);
> > + buf_id = BufTableLookup(&bufTag, bufHash);
> > ...
> > + bufHdr = GetBufferDescriptor(buf_id);
> > +
> > + buf_state = LockBufHdr(bufHdr);
> > +
> > + if (RelFileNodeEquals(bufHdr->tag.rnode, rnode)
> &&
> > + bufHdr->tag.forkNum == forkNum[i] &&
> > + bufHdr->tag.blockNum >= firstDelBlock[i])
> > + InvalidateBuffer(bufHdr); /* releases
> > spinlock */
> > + else
> > + UnlockBufHdr(bufHdr, buf_state);
> >
> > I'm a bit confused about the check here. We hold a buffer partition
> > lock, and have done a lookup in the mapping table. Why are we then
> > rechecking the relfilenode/fork/blocknum? And why are we doing so
> > holding the buffer header lock, which is essentially a spinlock, so
> > should only ever be held for very short portions?
> >
> > This looks like it's copying logic from DropRelFileNodeBuffers() etc,
> > but there the situation is different: We haven't done a buffer mapping
> > lookup, and we don't hold a partition lock!
>
> That's because the buffer partition lock is released immediately after the hash
> table has been looked up. As an aside, InvalidateBuffer() requires the caller
> to hold the buffer header spinlock and doesn't hold the buffer partition lock.

Yes. Holding the buffer header spinlock is necessary to invalidate the buffers.
As for buffer mapping partition lock, as mentioned by Tsunakawa-san, it is
released immediately after BufTableLookup, which is similar to lookup done in
PrefetchSharedBuffer. So I retained these changes.

I have attached the updated patches. Aside from descriptions, no other major
changes in the patch set except 0004. Feedbacks are welcome.

Regards,
Kirk Jamison

Attachment Content-Type Size
v32-0001-Prevent-invalidating-blocks-in-smgrextend-during.patch application/octet-stream 1.1 KB
v32-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks.patch application/octet-stream 9.5 KB
v32-0003-Optimize-DropRelFileNodeBuffers-during-recovery.patch application/octet-stream 8.9 KB
v32-0004-Optimize-DropRelFileNodesAllBuffers-in-recovery.patch application/octet-stream 6.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-11-26 03:18:18 Re: Huge memory consumption on partitioned table with FKs
Previous Message Amit Kapila 2020-11-26 03:02:58 Re: POC: Cleaning up orphaned files using undo logs