Re: [Patch] Optimize dropping of relation buffers using dlist

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: thomas(dot)munro(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, k(dot)jamison(at)fujitsu(dot)com, tsunakawa(dot)takay(at)fujitsu(dot)com, amit(dot)kapila16(at)gmail(dot)com, andres(at)anarazel(dot)de, robertmhaas(at)gmail(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-10-22 06:48:46
Message-ID: 20201022.154846.1787169636291470089.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 22 Oct 2020 18:54:43 +1300, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote in
> On Thu, Oct 22, 2020 at 5:52 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Per the referenced bug-reporting thread, it was ReiserFS and/or NFS on
> > SLES 9.3; so, dubious storage choices on an ancient-even-then Linux
> > kernel.
>
> Ohhhh. I can reproduce that on a modern Linux box by forcing
> writeback to a full NFS filesystem[1], approximately as the kernel
> does asynchronously when it feels like it, causing the size reported
> by SEEK_END to go down.

<test code>

> $ cc magic_shrinking_file.c
> $ ./a.out
> lseek(..., SEEK_END) = 9670656
> write(...) = 8192
> lseek(..., SEEK_END) = 9678848
> fsync(...) = -1
> lseek(..., SEEK_END) = 9670656

Interesting..

> > I think the takeaway point is not so much that that particular bug
> > might recur as that storage infrastructure does sometimes have bugs.
> > If you're wanting to introduce new assumptions about what the filesystem
> > will do, it's prudent to think about how badly will we break if the
> > assumptions fail.
>
> Yeah. My point was just that the caching trick doesn't seem to
> improve matters on this particular front, it can just cache a bogus
> value.
>
> [1] https://www.postgresql.org/message-id/CAEepm=1FGo=ACPKRmAxvb53mBwyVC=TDwTE0DMzkWjdbAYw7sw@mail.gmail.com

As I wrote in another branch of this thread, the requirement here is
making sure that we don't have a buffer for blocks after the file size
known to the process. Even if the cache gets a bogus value at the
first load, it's still true that we don't have a buffers for blocks
after that size. There's no problem as far as DropRelFileNodeBuffers
doesn't get a smaller value from smgrnblocks than the size the server
thinks.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-10-22 06:52:01 Re: [patch] Fix checksum verification in base backups for zero page headers
Previous Message Thomas Munro 2020-10-22 06:45:08 Re: [Patch] Optimize dropping of relation buffers using dlist