Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Trond Myklebust <trondmy(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bottomley James <James(dot)Bottomley(at)HansenPartnership(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Andres Freund <andres(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dave Chinner <david(at)fromorbit(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-14 15:42:05
Message-ID: A8B5E529-A3BA-4AEC-A58F-1AF16DA5AEF6@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Jan 14, 2014, at 10:39, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> James Bottomley <James(dot)Bottomley(at)HansenPartnership(dot)com> writes:
>> The current mechanism for coherency between a userspace cache and the
>> in-kernel page cache is mmap ... that's the only way you get the same
>> page in both currently.
>
> Right.
>
>> glibc used to have an implementation of read/write in terms of mmap, so
>> it should be possible to insert it into your current implementation
>> without a major rewrite. The problem I think this brings you is
>> uncontrolled writeback: you don't want dirty pages to go to disk until
>> you issue a write()
>
> Exactly.
>
>> I think we could fix this with another madvise():
>> something like MADV_WILLUPDATE telling the page cache we expect to alter
>> the pages again, so don't be aggressive about cleaning them.
>
> "Don't be aggressive" isn't good enough. The prohibition on early write
> has to be absolute, because writing a dirty page before we've done
> whatever else we need to do results in a corrupt database. It has to
> be treated like a write barrier.

Then why are you dirtying the page at all? It makes no sense to tell the kernel “we’re changing this page in the page cache, but we don’t want you to change it on disk”: that’s not consistent with the function of a page cache.

>> The problem is we can't give you absolute control of when pages are
>> written back because that interface can be used to DoS the system: once
>> we get too many dirty uncleanable pages, we'll thrash looking for memory
>> and the system will livelock.
>
> Understood, but that makes this direction a dead end. We can't use
> it if the kernel might decide to write anyway.
>
> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2014-01-14 15:49:34 Re: PoC: Partial sort
Previous Message Tom Lane 2014-01-14 15:39:35 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance