Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Dave Chinner <david(at)fromorbit(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-16 00:19:10
Message-ID: 20140116001910.GO3431@dastard
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
> Dave Chinner <david(at)fromorbit(dot)com> writes:
> > On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
> >> What we'd really like for checkpointing is to hand the kernel a boatload
> >> (several GB) of dirty pages and say "how about you push all this to disk
> >> over the next few minutes, in whatever way seems optimal given the storage
> >> hardware and system situation. Let us know when you're done."
>
> > The issue there is that the kernel has other triggers for needing to
> > clean data. We have no infrastructure to handle variable writeback
> > deadlines at the moment, nor do we have any infrastructure to do
> > roughly metered writeback of such files to disk. I think we could
> > add it to the infrastructure without too much perturbation of the
> > code, but as you've pointed out that still leaves the fact there's
> > no obvious interface to configure such behaviour. Would it need to
> > be persistent?
>
> No, we'd be happy to re-request it during each checkpoint cycle, as
> long as that wasn't an unduly expensive call to make. I'm not quite
> sure where such requests ought to "live" though. One idea is to tie
> them to file descriptors; but the data to be written might be spread
> across more files than we really want to keep open at one time.

It would be a property of the inode, as that is how writeback is
tracked and timed. Set and queried through a file descriptor,
though - it's basically the same context that fadvise works
through.

> But the only other idea that comes to mind is some kind of global sysctl,
> which would probably have security and permissions issues. (One thing
> that hasn't been mentioned yet in this thread, but maybe is worth pointing
> out now, is that Postgres does not run as root, and definitely doesn't
> want to. So we don't want a knob that would require root permissions
> to twiddle.)

I have assumed all along that requiring root to do stuff would be a
bad thing. :)

> We could probably live with serially checkpointing data
> in sets of however-many-files-we-can-have-open, if file descriptors are
> the place to keep the requests.

Inodes live longer than file descriptors, but there's no guarantee
that they live from one fd context to another. Hence my question
about persistence ;)

Cheers,

Dave.
--
Dave Chinner
david(at)fromorbit(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-16 00:21:07 Re: CREATE FOREIGN TABLE ( ... LIKE ... )
Previous Message David Fetter 2014-01-16 00:17:54 Re: CREATE FOREIGN TABLE ( ... LIKE ... )