RE: [Patch] Optimize dropping of relation buffers using dlist

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-12-07 07:02:35
Message-ID: OSBPR01MB2341D82E80D2425967B64BB5EFCE0@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, December 4, 2020 8:27 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Nov 27, 2020 at 11:36 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Fri, 27 Nov 2020 02:19:57 +0000, "k(dot)jamison(at)fujitsu(dot)com"
> > <k(dot)jamison(at)fujitsu(dot)com> wrote in
> > > > From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> Hello, Kirk.
> > > > Thank you for the new version.
> > >
> > > Hi, Horiguchi-san. Thank you for your very helpful feedback.
> > > I'm updating the patches addressing those.
> > >
> > > > + if (!smgrexists(rels[i], j))
> > > > + continue;
> > > > +
> > > > + /* Get the number of blocks for a relation's fork */
> > > > + blocks[i][numForks] = smgrnblocks(rels[i], j,
> > > > NULL);
> > > >
> > > > If we see a fork which its size is not cached we must give up this
> > > > optimization for all target relations.
> > >
> > > I did not use the "cached" flag in DropRelFileNodesAllBuffers and
> > > use InRecovery when deciding for optimization because of the following
> reasons:
> > > XLogReadBufferExtended() calls smgrnblocks() to apply changes to
> > > relation page contents. So in DropRelFileNodeBuffers(),
> > > XLogReadBufferExtended() is called during VACUUM replay because
> VACUUM changes the page content.
> > > OTOH, TRUNCATE doesn't change the relation content, it just
> > > truncates relation pages without changing the page contents. So
> > > XLogReadBufferExtended() is not called, and the "cached" flag will
> > > always return false. I tested with "cached" flags before, and this
> >
> > A bit different from the point, but if some tuples have been inserted
> > to the truncated table, XLogReadBufferExtended() is called for the
> > table and the length is cached.
> >
> > > always return false, at least in DropRelFileNodesAllBuffers. Due to
> > > this, we cannot use the cached flag in DropRelFileNodesAllBuffers().
> > > However, I think we can still rely on smgrnblocks to get the file
> > > size as long as we're InRecovery. That cached nblocks is still guaranteed
> to be the maximum in the shared buffer.
> > > Thoughts?
> >
> > That means that we always think as if smgrnblocks returns "cached" (or
> > "safe") value during recovery, which is out of our current consensus.
> > If we go on that side, we don't need to consult the "cached" returned
> > from smgrnblocks at all and it's enough to see only InRecovery.
> >
> > I got confused..
> >
> > We are relying on the "fact" that the first lseek() call of a
> > (startup) process tells the truth. We added an assertion so that we
> > make sure that the cached value won't be cleared during recovery. A
> > possible remaining danger would be closing of an smgr object of a live
> > relation just after a file extension failure. I think we are thinking
> > that that doesn't happen during recovery. Although it seems to me
> > true, I'm not confident.
> >
>
> Yeah, I also think it might not be worth depending upon whether smgr close
> has been done before or not. I feel the current idea of using 'cached'
> parameter is relatively solid and we should rely on that.
> Also, which means that in DropRelFileNodesAllBuffers() we should rely on
> the same, I think doing things differently in this regard will lead to confusion. I
> agree in some cases we might not get benefits but it is more important to be
> correct and keep the code consistent to avoid introducing bugs now or in the
> future.
>
Hi,
I have reported before that it is not always the case that the "cached" flag of
srnblocks() return true. So when I checked the truncate test case used in my
patch, it does not enter the optimization path despite doing INSERT before
truncation of table.
The reason for that is because in TRUNCATE, a new RelFileNode is assigned
to the relation when creating a new file. In recovery, XLogReadBufferExtended()
always opens the RelFileNode and calls smgrnblocks() for that RelFileNode for the
first time. And for recovery processing, different RelFileNodes are used for the
INSERTs to the table and TRUNCATE to the same table.

As we cannot use "cached" flag for both DropRelFileNodeBuffers() and
DropRelFileNodesAllBuffers() based from above.
I am thinking that if we want consistency, correctness, and to still make use of
the optimization, we can completely drop the "cached" flag parameter in smgrnblocks,
and use InRecovery.
Tsunakawa-san mentioned in [1] that it is safe because smgrclose is not called
by the startup process in recovery. Shared-inval messages are not sent to startup
process.

Otherwise, we use the current patch form as it is: using "cached" in
DropRelFileNodeBuffers() and InRecovery for DropRelFileNodesAllBuffers().
However, that does not seem to be what is wanted in this thread.

Thoughts?

Regards,
Kirk Jamison

[1] https://www.postgresql.org/message-id/TYAPR01MB2990B42570A5FAC349EE983AFEF40%40TYAPR01MB2990.jpnprd01.prod.outlook.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-07 07:57:12 Re: Single transaction in the tablesync worker?
Previous Message Amit Langote 2020-12-07 06:53:37 Re: ModifyTable overheads in generic plans