From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Fabien COELHO <fabien(dot)coelho(at)mines-paristech(dot)fr> |
Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: checkpointer continuous flushing |
Date: | 2015-08-17 15:13:06 |
Message-ID: | 20150817151306.GB10786@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015-08-17 15:21:22 +0200, Fabien COELHO wrote:
> My current thinking is "maybe yes, maybe no":-), as it may depend on the OS
> implementation of posix_fadvise, so it may differ between OS.
As long as fadvise has no 'undirty' option, I don't see how that
problem goes away. You're telling the OS to throw the buffer away, so
unless it ignores it that'll have consequences when you read the page
back in.
> This is a reason why I think that flushing should be kept a guc, even if the
> sort guc is removed and always on. The sync_file_range implementation is
> clearly always very beneficial for Linux, and the posix_fadvise may or may
> not induce a good behavior depending on the underlying system.
That's certainly an argument.
> This is also a reason why the default value for the flush guc is currently
> set to false in the patch. The documentation should advise to turn it on for
> Linux and to test otherwise. Or if Linux is assumed to be often a host, then
> maybe to set the default to on and to suggest that on some systems it may be
> better to have it off.
I'd say it should then be an os-specific default. No point in making
people work for it needlessly on linux and/or elsewhere.
> (Another reason to keep it "off" is that I'm not sure about what
> happens with such HD flushing features on virtual servers).
I don't see how that matters? Either the host will entirely ignore
flushing, and thus the sync_file_range and the fsync won't cost much, or
fsync will be honored, in which case the pre-flushing is helpful.
> Overall, I'm not pessimistic, because I've seen I/O storms on a FreeBSD host
> and it was as bad as Linux (namely the database and even the box was offline
> for long minutes...), and if you can avoid that having to read back some
> data may be not that bad a down payment.
I don't see how that'd alleviate my fear. Sure, the latency for many
workloads will be better, but I don't how that argument says anything
about the reads? And we'll not just use this in cases it'd be
beneficial...
> The issue is largely mitigated if the data is not removed from
> shared_buffers, because the OS buffer is just a copy of already hold data.
> What I would do on such systems is to increase shared_buffers and keep
> flushing on, that is to count less on the system cache and more on postgres
> own cache.
That doesn't work that well for a bunch of reasons. For one it's
completely non-adaptive. With the OS's page cache you can rely on free
memory being used for caching *and* it be available should a query or
another program need lots of memory.
> Overall, I'm not convince that the practice of relying on the OS cache is a
> good one, given what it does with it, at least on Linux.
The alternatives aren't super realistic near-term though. Using direct
IO efficiently on the set of operating systems we support is
*hard*. It's more or less trivial to hack pg up to use direct IO for
relations/shared_buffers, but it'll perform utterly horribly in many
many cases.
To pick one thing out: Without the OS buffering writes any write will
have to wait for the disks, instead being asynchronous. That'll make
writes performed by backends a massive bottleneck.
> Now, if someone could provide a dedicated box with posix_fadvise (say
> FreeBSD, maybe others...) for testing that would allow to provide data
> instead of speculating... and then maybe to decide to change its default
> value.
Testing, as an approximation, how it turns out to work on linux would be
a good step.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Bear Giles | 2015-08-17 15:36:02 | Re: what would tar file FDW look like? |
Previous Message | Andrew Dunstan | 2015-08-17 15:03:27 | Re: what would tar file FDW look like? |