Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Dave Chinner <david(at)fromorbit(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, James Bottomley <James(dot)Bottomley(at)hansenpartnership(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Trond Myklebust <trondmy(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-14 14:39:57
Message-ID: 52D54C3D.5020700@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/14/2014 09:39 AM, Claudio Freire wrote:
> On Tue, Jan 14, 2014 at 5:08 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> Again, as said above the linux file system is doing fine. What we
>> want is a few ways to interact with it to let it do even better when
>> working with postgresql by telling it some stuff it otherwise would
>> have to second guess and by sometimes giving it back some cache
>> pages which were copied away for potential modifying but ended
>> up clean in the end.
> You don't need new interfaces. Only a slight modification of what
> fadvise DONTNEED does.
>
> This insistence in injecting pages from postgres to kernel is just a
> bad idea.
Do you think it would be possible to map copy-on-write pages
from linux cache to postgresql cache ?

this would be a step in direction of solving the double-ram-usage
of pages which have not been read from syscache to postgresql
cache without sacrificing linux read-ahead (which I assume does
not happen when reads bypass system cache).

and we can write back the copy at the point when it is safe (from
postgresql perspective) to let the system write them back ?

Do you think it is possible to make it work with good performance
for a few million 8kb pages ?

> At the very least, it still needs postgres to know too much
> of the filesystem (block layout) to properly work. Ie: pg must be
> required to put entire filesystem-level blocks into the page cache,
> since that's how the page cache works.
I was more thinking of an simple write() interface with extra
flags/sysctls to tell kernel that "we already have this on disk"
> At the very worst, it may
> introduce serious security and reliability implications, when
> applications can destroy the consistency of the page cache (even if
> full access rights are checked, there's still the possibility this
> inconsistency might be exploitable).
If you allow write() which just writes clean pages, I can not see
where the extra security concerns are beyond what normal
write can do.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-01-14 14:40:48 Re: Linux kernel impact on PostgreSQL performance
Previous Message Claudio Freire 2014-01-14 14:35:07 Re: Optimize kernel readahead using buffer access strategy