Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: James Bottomley <James(dot)Bottomley(at)HansenPartnership(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-13 22:22:05
Message-ID: 1389651725.12062.55.camel@dabdike.int.hansenpartnership.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2014-01-13 at 21:29 +0000, Greg Stark wrote:
> On Mon, Jan 13, 2014 at 9:12 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > For one, postgres doesn't use mmap for files (and can't without major
> > new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has
> > horrible consequences for performance/scalability - very quickly you
> > contend on locks in the kernel.
>
>
> I may as well dump this in this thread. We've discussed this in person
> a few times, including at least once with Ted T'so when he visited
> Dublin last year.
>
> The fundamental conflict is that the kernel understands better the
> hardware and other software using the same resources, Postgres
> understands better its own access patterns. We need to either add
> interfaces so Postgres can teach the kernel what it needs about its
> access patterns or add interfaces so Postgres can find out what it
> needs to know about the hardware context.
>
> The more ambitious and interesting direction is to let Postgres tell
> the kernel what it needs to know to manage everything. To do that we
> would need the ability to control when pages are flushed out. This is
> absolutely necessary to maintain consistency. Postgres would need to
> be able to mark pages as unflushable until some point in time in the
> future when the journal is flushed. We discussed various ways that
> interface could work but it would be tricky to keep it low enough
> overhead to be workable.

So in this case, the question would be what additional information do we
need to exchange that's not covered by the existing interfaces. Between
madvise and splice, we seem to have most of what you want; what's
missing?

> The less exciting, more conservative option would be to add kernel
> interfaces to teach Postgres about things like raid geometries. Then
> Postgres could use directio and decide to do prefetching based on the
> raid geometry, how much available i/o bandwidth and iops is available,
> etc.
>
> Reimplementing i/o schedulers and all the rest of the work that the
> kernel provides inside Postgres just seems like something outside our
> competency and that none of us is really excited about doing.

This would also be a well trodden path ... I believe that some large
database company introduced Direct IO for roughly this purpose.

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mel Gorman 2014-01-13 22:26:45 Re: Linux kernel impact on PostgreSQL performance
Previous Message Merlin Moncure 2014-01-13 22:20:40 Re: Disallow arrays with non-standard lower bounds