RE: [Patch] Optimize dropping of relation buffers using dlist

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
Cc: "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "tomas(dot)vondra(at)2ndquadrant(dot)com" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-10-15 06:55:22
Message-ID: TYAPR01MB299019435B9FF78C53DF2787FE020@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Jamison, Kirk/ジャミソン カーク <k(dot)jamison(at)fujitsu(dot)com>
> 2. Non-recovery Performance
> However, I still can't seem to find the cause of why the non-recovery
> performance does not change when compared to master. (1 min 15 s for the
> given test case below)
...
> 5. Measure VACUUM timing
> \timing
> VACUUM;

Oops, why are you using VACUUM? Aren't you trying to speed up TRUNCATE?

Even if you wanted to utilize the truncation at the end of VACUUM for measuring truncation speed, your way measures the whole VACUUM processing, which includes the garbage collection process. The garbage collection should dominate the time.

> 3. Recovery Performance (hot standby, failover) OTOH, when executing
> 2. [Master] Measure VACUUM timing. Then stop server.
> \timing
> VACUUM;
> \q
> pg_ctl stop -mi -w
>
> 3. [Standby] Use the attached script to promote standby and measure the
> performance.
> # test.sh recovery

You didn't DELETE the table data as opposed to the non-recovery case. Then, the replay of VACUUM should do nothing. That's why you got a good performance number.

TRUNCATE goes this path:

[non-recovery]
CommitTransaction
smgrdopendingdeletes
smgrdounlinkall
DropRelFileNodesAllBuffers

[recovery]
xact_redo_commit
DropRelationFiles
smgrdounlinkall
DropRelFileNodesAllBuffers

So, you need to modify DropRelFileNodesAllBuffers(). OTOH, DropRelFileNodeBuffers(), which you modified, is used in VACUUM's truncation and another case. The modification itself is useful because it can shorten the occasional hickup during autovacuum, so you don't remove the change.

(The existence of these two paths is tricky; anyone on this thread didn't notice, and I forgot about it. It would be good to refactor this, but it's a separate undertaking, I think.)

Below are my comments for the code:

(1)
@@ -572,6 +572,9 @@ smgrnblocks(SMgrRelation reln, ForkNumber forknum, bool *accurate)
+ if (accurate != NULL)
+ *accurate = false;
+

The above change should be in 002, right?

(2)
+ /* Get the total nblocks for a relation's fork */

total nblocks -> number of blocks

(3)
+ if (nForkBlocks[i] == InvalidBlockNumber ||
+ nBlocksToInvalidate >= BUF_DROP_FULL_SCAN_THRESHOLD)
+ break;

With this code, you haven't addressed what I commented previously. If the size of the first fork is accurate but that of the second one is not, the first fork is processed in an optimized way while the second fork is done in the traditional way. What you want to here is to only use the traditional way for all forks, right?

So, remove the above change and replace

+ if (!accurate)
+ {
+ nForkBlocks[i] = InvalidBlockNumber;
+ break;
+ }

with

+ if (!accurate)
+ break;

And after the first for loop, put

if (!accurate || nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD)
goto full_scan;

And remove the following code and instead put the "full_scan:" label there.

+ if (i >= nforks)
+ return;
+

Or, instead of using goto, you can write like this:

for (...)
calculate # of invalidated blocks

if (accurate && nBlocksToInvalidate >= BUF_DROP_FULL_SCAN_THRESHOLD)
{
do the optimized way;
return;
}

do the traditional way;

I prefer using goto here because the loop nesting gets shallow. But that's a matter of taste and you can choose either.

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-10-15 06:56:21 Re: scram-sha-256 broken with FIPS and OpenSSL 1.0.2
Previous Message Brar Piening 2020-10-15 06:27:51 Aw: Re: Minor documentation error regarding streaming replication protocol