Re: fallocate / posix_fallocate for new WAL file creation (etc...)

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date: 2013-07-01 21:07:36
Message-ID: 51D1EF98.5090603@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/1/13 3:44 PM, Robert Haas wrote:
> Yeah. If the patch isn't going to be a win on RHEL 5, I'd consider
> that a good reason to scrap it for now and revisit it in 3 years.
> There are still a LOT of people running RHEL 5, and the win isn't big
> enough to engineer a more complex solution.

I'm still testing, expect to have this wrapped up on my side by
tomorrow. So much of the runtime here is the file setup/close that
having a 2:1 difference in number of writes, what happens on the old
platforms, it is hard to get excited about.

I don't think the complexity to lock out RHEL5 here is that bad even if
it turns out to be a good idea. Just add another configure check for
fallocate, and on Linux if it's not there don't use posix_fallocate
either. Maybe 5 lines of macro code? RHEL5 sure isn't going anyway
anytime soon, but at the same time there won't be that many 9.4
deployments on that version.

I've been digging into the main situation where this feature helps, and
it won't be easy to duplicate in a benchmark situation. Using Linux's
fallocate works as a hint that the whole 16MB should be allocated at
once, and therefore together on disk if feasible. The resulting WAL
files should be less prone to fragmentation. That's actually the
biggest win of this approach, but I can't easily duplicate the sort of
real-world fragmentation I see on live servers here. Given that, I'm
leaning toward saying that unless there's a clear regression on older
platforms, above the noise floor, this is still the right thing to do.

I fully agree that this needs to fully automatic--no GUC--before it's
worth committing. If we can't figure out the right thing to do now,
there's little hope anyone else will in a later tuning expedition.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-07-01 21:21:18 Re: Randomisation for ensuring nlogn complexity in quicksort
Previous Message Simon Riggs 2013-07-01 21:07:20 Re: changeset generation v5-01 - Patches & git tree