Re: [HACKERS] Clock with Adaptive Replacement

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Clock with Adaptive Replacement
Date: 2018-05-01 00:19:49
Message-ID: 20180501001949.u32mdyxc6xjnqqxs@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-05-01 12:15:21 +1200, Thomas Munro wrote:
> On Thu, Apr 26, 2018 at 1:31 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > ... I
> > suppose when you read a page in, you could tell the kernel that you
> > POSIX_FADV_DONTNEED it, and when you steal a clean PG buffer you could
> > tell the kernel that you POSIX_FADV_WILLNEED its former contents (in
> > advance somehow), on the theory that the coldest stuff in the PG cache
> > should now become the hottest stuff in the OS cache. Of course that
> > sucks, because the best the kernel can do then is go and read it from
> > disk, and the goal is to avoid IO. Given a hypothetical way to
> > "write" "clean" data to the kernel (so it wouldn't mark it dirty and
> > generate IO, but it would let you read it back without generating IO
> > if you're lucky), then perhaps you could actually achieve exclusive
> > caching at the two levels, and use all your physical RAM without
> > duplication.
>
> Craig said essentially the same thing, on the nearby fsync() reliability thread:
>
> On Sun, Apr 29, 2018 at 1:50 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> > ... I'd kind of hoped to go in
> > the other direction if anything, with some kind of pseudo-write op
> > that let us swap a dirty shared_buffers entry from our shared_buffers
> > into the OS dirty buffer cache (on Linux at least) and let it handle
> > writeback, so we reduce double-buffering. Ha! So much for that!
>
> I would like to reply to that on this thread which discusses double
> buffering and performance, to avoid distracting the fsync() thread
> from its main topic of reliability.

It's not going to happen. Robert and I talked to the kernel devs a
couple years back, and I've brought it up again. The kernel has
absolutely no chance to verify the content of that written data, meaning
that suddenly you'd get differing data based on cache pressure. It seems
unsurprising that kernel devs aren't wild about that idea. The
likelihood of that opening up weird exploits (imagine a suid binary
reading such data later!), seems also pretty high.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-05-01 00:24:19 Re: "could not reattach to shared memory" on buildfarm member dory
Previous Message Thomas Munro 2018-05-01 00:15:21 Re: [HACKERS] Clock with Adaptive Replacement