Re: [Patch] Optimize dropping of relation buffers using dlist

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: amit(dot)kapila16(at)gmail(dot)com
Cc: k(dot)jamison(at)fujitsu(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-09-16 03:32:22
Message-ID: 20200916.123222.724387222397793701.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 16 Sep 2020 08:33:06 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in
> On Wed, Sep 16, 2020 at 7:46 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > Is this means lseek(SEEK_END) doesn't count blocks that are
> > write(2)'ed (by smgrextend) but not yet flushed? (I don't think so,
> > for clarity.) The nblocks cache is added just to reduce the number of
> > lseek()s and expected to always have the same value with what lseek()
> > is expected to return.
> >
>
> See comments in ReadBuffer_common() which indicates such a possibility
> ("Unfortunately, we have also seen this case occurring because of
> buggy Linux kernels that sometimes return an lseek(SEEK_END) result
> that doesn't account for a recent write."). Also, refer my previous
> email [1] on this and another email link in that email which has a
> discussion on this point.
>
> > The reason it is reliable only during recovery
> > is that the cache is not shared but the startup process is the only
> > process that changes the relation size during recovery.
> >
>
> Yes, that is why we are planning to do this optimization for recovery path.
>
> > If any other process can extend the relation while smgrtruncate is
> > running, the current DropRelFileNodeBuffers should have the chance
> > that a new buffer for extended area is allocated at a buffer location
> > where the function already have passed by, which is a disaster.
> >
>
> The relation might have extended before smgrtruncate but the newly
> added pages can be flushed by checkpointer during smgrtruncate.
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1LH2uQWznwtonD%2Bnch76kqzemdTQAnfB06z_LXa6NTFtQ%40mail.gmail.com

Ah! I understood that! The reason we can rely on the cahce is that the
cached value is *not* what lseek returned but how far we intended to
extend. Thank you for the explanation.

By the way I'm not sure that actually happens, but if one smgrextend
call exnteded the relation by two or more blocks, the cache is
invalidated and succeeding smgrnblocks returns lseek()'s result. Don't
we need to guarantee the cache to be valid while recovery?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-09-16 03:44:38 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Ashutosh Sharma 2020-09-16 03:17:00 Re: recovering from "found xmin ... from before relfrozenxid ..."