RE: [Patch] Optimize dropping of relation buffers using dlist

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-09-15 01:40:30
Message-ID: OSBPR01MB2341EB559E240818E2BFA79FEF200@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, September 8, 2020 1:02 PM, Amit Kapila wrote:
Hello,
> On Mon, Sep 7, 2020 at 1:33 PM k(dot)jamison(at)fujitsu(dot)com
> <k(dot)jamison(at)fujitsu(dot)com> wrote:
> >
> > On Wednesday, September 2, 2020 5:49 PM, Amit Kapila wrote:
> > > On Wed, Sep 2, 2020 at 9:17 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > >
> > > > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > > > > Even if the relation is locked, background processes like
> > > > > checkpointer can still touch the relation which might cause
> > > > > problems. Consider a case where we extend the relation but
> > > > > didn't flush the newly added pages. Now during truncate
> > > > > operation, checkpointer can still flush those pages which can
> > > > > cause trouble for truncate. But, I think in the recovery path
> > > > > such cases won't cause a
> > > problem.
> > > >
> > > > I wouldn't count on that staying true ...
> > > >
> > > >
> > >
> https://www.postgresql.org/message-id/CA+hUKGJ8NRsqgkZEnsnRc2MFR
> > > OBV-jC
> > > > nacbYvtpptK2A9YYp9Q(at)mail(dot)gmail(dot)com
> > > >
> > >
> > > I don't think that proposal will matter after commit c5315f4f44
> > > because we are caching the size/blocks for recovery while doing
> > > extend (smgrextend). In the above scenario, we would have cached the
> > > blocks which will be used at later point of time.
> > >
> >
> > I'm guessing we can still pursue this idea of improving the recovery path
> first.
> >
>
> I think so.

Alright, so I've updated the patch which passes the regression and TAP tests.
It compiles and builds as intended.

> > I'm working on an updated patch version, because the CFBot's telling
> > that postgres fails to build (one of the recovery TAP tests fails).
> > I'm still working on refactoring my patch, but have yet to find a proper
> solution at the moment.
> > So I'm going to continue my investigation.
> >
> > Attached is an updated WIP patch.
> > I'd appreciate if you could take a look at the patch as well.
> >
>
> So, I see the below log as one of the problems:
> 2020-09-07 06:20:33.918 UTC [10914] LOG: redo starts at 0/15FFEC0
> 2020-09-07 06:20:33.919 UTC [10914] FATAL: unexpected data beyond EOF
> in block 1 of relation base/13743/24581
>
> This indicates that we missed invalidating some buffer which should have
> been invalidated. If you are able to reproduce this locally then I suggest to first
> write a simple patch without the check of the threshold, basically in recovery
> always try to use the new way to invalidate the buffer. That will reduce the
> scope of the code that can create a problem. Let us know if the problem still
> exists and share the logs. BTW, I think I see one problem in the code:
>
> if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
> + bufHdr->tag.forkNum == forkNum[j] && tag.blockNum >=
> + bufHdr->firstDelBlock[j])
>
> Here, I think you need to use 'i' not 'j' for forkNum and
> firstDelBlock as those are arrays w.r.t forks. That might fix the
> problem but I am not sure as I haven't tried to reproduce it.

Thanks for advice. Right, that seems to be the cause of error,
and fixing that (using fork) solved the case.
I also followed the advice of Tsunakawa-san of using more meaningful iterator
Instead of using "i" & "j" for readability.

I also added a new function when relation fork is bigger than the threshold
If (nblocks > BUF_DROP_FULLSCAN_THRESHOLD)
(DropRelFileNodeBuffersOfFork) Perhaps there's a better name for that function.
However, as expected in the previous discussions, this is a bit slower than the
standard buffer invalidation process, because the whole shared buffers are scanned nfork times.
Currently, I set the threshold to (NBuffers / 500)

Feedback on the patch/testing are very much welcome.

Best regards,
Kirk Jamison

Attachment Content-Type Size
v12-Speedup-dropping-of-relation-buffers-during-recovery.patch application/octet-stream 9.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-09-15 01:56:50 Re: logtape.c stats don't account for unused "prefetched" block numbers
Previous Message David Rowley 2020-09-15 00:58:40 Re: Hybrid Hash/Nested Loop joins and caching results from subplans