RE: [Patch] Optimize dropping of relation buffers using dlist

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-09-01 13:02:28
Message-ID: OSBPR01MB234105E5F1F0B1E5088C31DCEF2E0@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, August 18, 2020 3:05 PM (GMT+9), Amit Kapila wrote:
> On Fri, Aug 7, 2020 at 9:33 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > > On Sat, Aug 1, 2020 at 1:53 AM Andres Freund <andres(at)anarazel(dot)de>
> wrote:
> > >> We could also just use pg_class.relpages. It'll probably mostly be
> > >> accurate enough?
> >
> > > Don't we need the accurate 'number of blocks' if we want to
> > > invalidate all the buffers? Basically, I think we need to perform
> > > BufTableLookup for all the blocks in the relation and then Invalidate all
> buffers.
> >
> > Yeah, there is no room for "good enough" here. If a dirty buffer
> > remains in the system, the checkpointer will eventually try to flush
> > it, and fail (because there's no file to write it to), and then
> > checkpointing will be stuck. So we cannot afford to risk missing any
> buffers.
> >
>
> Today, again thinking about this point it occurred to me that during recovery
> we can reliably find the relation size and after Thomas's recent commit
> c5315f4f44 (Cache smgrnblocks() results in recovery), we might not need to
> even incur the cost of lseek. Why don't we fix this first for 'recovery' (by
> following something on the lines of what Andres suggested) and then later
> once we have a generic mechanism for "caching the relation size" [1], we can
> do it for non-recovery paths.
> I think that will at least address the reported use case with some minimal
> changes.
>
> [1] -
> https://www.postgresql.org/message-id/CAEepm%3D3SSw-Ty1DFcK%3D1r
> U-K6GSzYzfdD4d%2BZwapdN7dTa6%3DnQ%40mail.gmail.com
>

Attached is an updated V9 version with minimal code changes only and
avoids the previous overhead in the BufferAlloc. This time, I only updated
the recovery path as suggested by Amit, and followed Andres' suggestion
of referring to the cached blocks in smgrnblocks.
The layering is kinda tricky so the logic may be wrong. But as of now,
it passes the regression tests. I'll follow up with the performance results.
It seems there's regression for smaller shared_buffers. I'll update if I find bugs.
But I'd also appreciate your reviews in case I missed something.

Regards,
Kirk Jamison

Attachment Content-Type Size
v9-Speedup-dropping-of-relation-buffers-during-recovery.patch application/octet-stream 7.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2020-09-01 13:13:58 clarify "rewritten" in pg_checksums docs
Previous Message Daniel Gustafsson 2020-09-01 12:43:58 Re: Support for NSS as a libpq TLS backend