Re: [Patch] Optimize dropping of relation buffers using dlist

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, k(dot)jamison(at)fujitsu(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-12-22 12:11:01
Message-ID: CAA4eK1+hP-yoaJmxo0jmd8umJzipDfrOAGXd1MpimLyK1quVRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 22, 2020 at 2:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Apart from tests, do let me know if you are happy with the changes in
> the patch? Next, I'll look into DropRelFileNodesAllBuffers()
> optimization patch.
>

Review of v35-0004-Optimize-DropRelFileNodesAllBuffers-in-recovery [1]
========================================================
1.
DropRelFileNodesAllBuffers()
{
..
+buffer_full_scan:
+ pfree(block);
+ nodes = palloc(sizeof(RelFileNode) * n); /* non-local relations */
+ for (i = 0; i < n; i++)
+ nodes[i] = smgr_reln[i]->smgr_rnode.node;
+
..
}

How is it correct to assign nodes array directly from smgr_reln? There
is no one-to-one correspondence. If you see the code before patch, the
passed array can have mixed of temp and non-temp relation information.

2.
+ for (i = 0; i < n; i++)
{
- pfree(nodes);
+ for (j = 0; j <= MAX_FORKNUM; j++)
+ {
+ /*
+ * Assign InvalidblockNumber to a block if a relation
+ * fork does not exist, so that we can skip it later
+ * when dropping the relation buffers.
+ */
+ if (!smgrexists(smgr_reln[i], j))
+ {
+ block[i][j] = InvalidBlockNumber;
+ continue;
+ }
+
+ /* Get the number of blocks for a relation's fork */
+ block[i][j] = smgrnblocks(smgr_reln[i], j, &cached);

Similar to above, how can we assume smgr_reln array has all non-local
relations? Have we tried the case with mix of temp and non-temp
relations?

In this code, I am slightly worried about the additional cost of each
time checking smgrexists. Consider a case where there are many
relations and only one or few of them have not cached the information,
in such a case we will pay the cost of smgrexists for many relations
without even going to the optimized path. Can we avoid that in some
way or at least reduce its usage to only when it is required? One idea
could be that we first check if the nblocks information is cached and
if so then we don't need to call smgrnblocks, otherwise, check if it
exists. For this, we need an API like smgrnblocks_cahced, something we
discussed earlier but preferred the current API. Do you have any
better ideas?

[1] - https://www.postgresql.org/message-id/OSBPR01MB2341882F416A282C3F7D769DEFC70%40OSBPR01MB2341.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 陈佳昕 (步真) 2020-12-22 12:30:56 回复:Re: Cache relation sizes?
Previous Message Pavel Stehule 2020-12-22 11:49:35 Re: proposal: schema variables