RE: [Patch] Optimize dropping of relation buffers using dlist

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Andres Freund' <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "tomas(dot)vondra(at)2ndquadrant(dot)com" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-11-19 07:07:34
Message-ID: TYAPR01MB29907DEEB7D8F4D146295634FEE00@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Andres Freund <andres(at)anarazel(dot)de>
> DropRelFileNodeBuffers() in recovery? The most common path is
> DropRelationFiles()->smgrdounlinkall()->DropRelFileNodesAllBuffers(),
> which 3/4 doesn't address and 4/4 doesn't mention.
>
> 4/4 seems to address DropRelationFiles(), but only talks about TRUNCATE?

Yes. DropRelationFiles() is used in the following two paths:

[Replay of TRUNCATE during recovery]
xact_redo_commit/abort() -> DropRelationFiles()
-> smgrdounlinkall() -> DropRelFileNodesAllBuffers()

[COMMIT/ROLLBACK PREPARED]
FinishPreparedTransaction() -> DropRelationFiles()
-> smgrdounlinkall() -> DropRelFileNodesAllBuffers()

> I also don't get why 4/4 would be a good idea on its own. It uses
> BUF_DROP_FULL_SCAN_THRESHOLD to guard
> FindAndDropRelFileNodeBuffers() on a per relation basis. But since
> DropRelFileNodesAllBuffers() can be used for many relations at once, this
> could end up doing BUF_DROP_FULL_SCAN_THRESHOLD - 1 lookups a lot of
> times, once for each of nnodes relations?

So, the threshold value should be compared with the total number of blocks of all target relations, not each relation. You seem to be right, got it.

> Also, how is 4/4 safe - this is outside of recovery too?

It seems that DropRelFileNodesAllBuffers() should trigger the new optimization path only when InRecovery == true, because it intentionally doesn't check the "accurate" value returned from smgrnblocks().

> Smaller comment:
>
> +static void
> +FindAndDropRelFileNodeBuffers(RelFileNode rnode, ForkNumber *forkNum,
> int nforks,
> + BlockNumber
> *nForkBlocks, BlockNumber *firstDelBlock)
> ...
> + /* Check that it is in the buffer pool. If not, do nothing.
> */
> + LWLockAcquire(bufPartitionLock, LW_SHARED);
> + buf_id = BufTableLookup(&bufTag, bufHash);
> ...
> + bufHdr = GetBufferDescriptor(buf_id);
> +
> + buf_state = LockBufHdr(bufHdr);
> +
> + if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) &&
> + bufHdr->tag.forkNum == forkNum[i] &&
> + bufHdr->tag.blockNum >= firstDelBlock[i])
> + InvalidateBuffer(bufHdr); /* releases
> spinlock */
> + else
> + UnlockBufHdr(bufHdr, buf_state);
>
> I'm a bit confused about the check here. We hold a buffer partition lock, and
> have done a lookup in the mapping table. Why are we then rechecking the
> relfilenode/fork/blocknum? And why are we doing so holding the buffer header
> lock, which is essentially a spinlock, so should only ever be held for very short
> portions?
>
> This looks like it's copying logic from DropRelFileNodeBuffers() etc, but there
> the situation is different: We haven't done a buffer mapping lookup, and we
> don't hold a partition lock!

That's because the buffer partition lock is released immediately after the hash table has been looked up. As an aside, InvalidateBuffer() requires the caller to hold the buffer header spinlock and doesn't hold the buffer partition lock.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2020-11-19 07:22:08 Re: Is postgres ready for 2038?
Previous Message Tom Lane 2020-11-19 07:03:59 Re: Should we document IS [NOT] OF?