Re: Linux kernel impact on PostgreSQL performance

From: Jim Nasby <jim(at)nasby(dot)net>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>
Subject: Re: Linux kernel impact on PostgreSQL performance
Date: 2014-01-15 04:07:31
Message-ID: 52D60983.9090701@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/14/14, 6:36 PM, Claudio Freire wrote:
> On Tue, Jan 14, 2014 at 9:22 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> On 1/14/14, 11:30 AM, Jeff Janes wrote:
>>>
>>> I think the "reclaim this page if you need memory but leave it resident if
>>> there is no memory pressure" hint would be more useful for temporary working
>>> files than for what was being discussed above (shared buffers). When I do
>>> work that needs large temporary files, I often see physical write IO spike
>>> but physical read IO does not. I interpret that to mean that the temporary
>>> data is being written to disk to satisfy either dirty_expire_centisecs or
>>> dirty_*bytes, but the data remains in the FS cache and so disk reads are not
>>> needed to satisfy it. So a hint that says "this file will never be fsynced
>>> so please ignore dirty_*bytes and dirty_expire_centisecs. I will need it
>>> again relatively soon (but not after a reboot), but will do so mostly
>>> sequentially, so please don't evict this without need, but if you do need to
>>> then it is a good candidate" would be good.
>>
>>
>> I also frequently see this, and it has an even larger impact if pgsql_tmp is
>> on the same filesystem as WAL. Which *theoretically* shouldn't matter with a
>> BBU controller, except that when the kernel suddenly decides your
>> *temporary* data needs to hit the media you're screwed.
>>
>> Though, it also occurs to me... perhaps it would be better for us to simply
>> map temp objects to memory and let the kernel swap them out if needed...
>
>
> Oum... bad idea.
>
> Swap logic has very poor taste for I/O patterns.

Well, to be honest, so do we. Practically zero in fact...

In fact, the kernel might even be in a better position than we are since you can presumably count page faults much more cheaply than we can.

BTW, if you guys are looking at ARC you should absolutely read discussion about that in our archives (http://lnk.nu/postgresql.org/2zeu/ as a starting point). We put considerable effort into it, had it in two minor versions, and then switched to a clock-sweep algorithm that's similar to what FreeBSD used, at least in the 4.x days. Definitely not claiming what we've got is the best (in fact, I think we're hurt by not maintaining a real free list), but the ARC info there is probably valuable.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2014-01-15 04:12:37 Portal suddenly disappears?
Previous Message Jim Nasby 2014-01-15 04:01:39 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance