Re: [Patch] Optimize dropping of relation buffers using dlist

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-12-07 11:48:31
Message-ID: CAA4eK1J5Zxnm7+dCJqqZy+z+SpV2HexAdfgkOJoFa9icj9R6Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 7, 2020 at 12:32 PM k(dot)jamison(at)fujitsu(dot)com
<k(dot)jamison(at)fujitsu(dot)com> wrote:
>
> On Friday, December 4, 2020 8:27 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Fri, Nov 27, 2020 at 11:36 AM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > >
> > > At Fri, 27 Nov 2020 02:19:57 +0000, "k(dot)jamison(at)fujitsu(dot)com"
> > > <k(dot)jamison(at)fujitsu(dot)com> wrote in
> > > > > From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> Hello, Kirk.
> > > > > Thank you for the new version.
> > > >
> > > > Hi, Horiguchi-san. Thank you for your very helpful feedback.
> > > > I'm updating the patches addressing those.
> > > >
> > > > > + if (!smgrexists(rels[i], j))
> > > > > + continue;
> > > > > +
> > > > > + /* Get the number of blocks for a relation's fork */
> > > > > + blocks[i][numForks] = smgrnblocks(rels[i], j,
> > > > > NULL);
> > > > >
> > > > > If we see a fork which its size is not cached we must give up this
> > > > > optimization for all target relations.
> > > >
> > > > I did not use the "cached" flag in DropRelFileNodesAllBuffers and
> > > > use InRecovery when deciding for optimization because of the following
> > reasons:
> > > > XLogReadBufferExtended() calls smgrnblocks() to apply changes to
> > > > relation page contents. So in DropRelFileNodeBuffers(),
> > > > XLogReadBufferExtended() is called during VACUUM replay because
> > VACUUM changes the page content.
> > > > OTOH, TRUNCATE doesn't change the relation content, it just
> > > > truncates relation pages without changing the page contents. So
> > > > XLogReadBufferExtended() is not called, and the "cached" flag will
> > > > always return false. I tested with "cached" flags before, and this
> > >
> > > A bit different from the point, but if some tuples have been inserted
> > > to the truncated table, XLogReadBufferExtended() is called for the
> > > table and the length is cached.
> > >
> > > > always return false, at least in DropRelFileNodesAllBuffers. Due to
> > > > this, we cannot use the cached flag in DropRelFileNodesAllBuffers().
> > > > However, I think we can still rely on smgrnblocks to get the file
> > > > size as long as we're InRecovery. That cached nblocks is still guaranteed
> > to be the maximum in the shared buffer.
> > > > Thoughts?
> > >
> > > That means that we always think as if smgrnblocks returns "cached" (or
> > > "safe") value during recovery, which is out of our current consensus.
> > > If we go on that side, we don't need to consult the "cached" returned
> > > from smgrnblocks at all and it's enough to see only InRecovery.
> > >
> > > I got confused..
> > >
> > > We are relying on the "fact" that the first lseek() call of a
> > > (startup) process tells the truth. We added an assertion so that we
> > > make sure that the cached value won't be cleared during recovery. A
> > > possible remaining danger would be closing of an smgr object of a live
> > > relation just after a file extension failure. I think we are thinking
> > > that that doesn't happen during recovery. Although it seems to me
> > > true, I'm not confident.
> > >
> >
> > Yeah, I also think it might not be worth depending upon whether smgr close
> > has been done before or not. I feel the current idea of using 'cached'
> > parameter is relatively solid and we should rely on that.
> > Also, which means that in DropRelFileNodesAllBuffers() we should rely on
> > the same, I think doing things differently in this regard will lead to confusion. I
> > agree in some cases we might not get benefits but it is more important to be
> > correct and keep the code consistent to avoid introducing bugs now or in the
> > future.
> >
> Hi,
> I have reported before that it is not always the case that the "cached" flag of
> srnblocks() return true. So when I checked the truncate test case used in my
> patch, it does not enter the optimization path despite doing INSERT before
> truncation of table.
> The reason for that is because in TRUNCATE, a new RelFileNode is assigned
> to the relation when creating a new file. In recovery, XLogReadBufferExtended()
> always opens the RelFileNode and calls smgrnblocks() for that RelFileNode for the
> first time. And for recovery processing, different RelFileNodes are used for the
> INSERTs to the table and TRUNCATE to the same table.
>

Hmm, how is it possible if Insert is done before Truncate? The insert
should happen in old RelFileNode only. I have verified by adding a
break-in (while (1), so that it stops there) heap_xlog_insert and
DropRelFileNodesAllBuffers(), and both get the same (old) RelFileNode.
How have you verified what you are saying?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-07 11:56:28 Re: Parallel Inserts in CREATE TABLE AS
Previous Message Bharath Rupireddy 2020-12-07 10:50:14 Re: Parallel Inserts in CREATE TABLE AS