Quick Links

Re: checkpointer continuous flushing

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Fabien COELHO <fabien(dot)coelho(at)mines-paristech(dot)fr>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpointer continuous flushing
Date:	2015-08-17 15:13:06
Message-ID:	20150817151306.GB10786@awork2.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2015-08-17 15:21:22 +0200, Fabien COELHO wrote:
> My current thinking is "maybe yes, maybe no":-), as it may depend on the OS
> implementation of posix_fadvise, so it may differ between OS.

As long as fadvise has no 'undirty' option, I don't see how that
problem goes away. You're telling the OS to throw the buffer away, so
unless it ignores it that'll have consequences when you read the page
back in.

> This is a reason why I think that flushing should be kept a guc, even if the
> sort guc is removed and always on. The sync_file_range implementation is
> clearly always very beneficial for Linux, and the posix_fadvise may or may
> not induce a good behavior depending on the underlying system.

That's certainly an argument.

> This is also a reason why the default value for the flush guc is currently
> set to false in the patch. The documentation should advise to turn it on for
> Linux and to test otherwise. Or if Linux is assumed to be often a host, then
> maybe to set the default to on and to suggest that on some systems it may be
> better to have it off.

I'd say it should then be an os-specific default. No point in making
people work for it needlessly on linux and/or elsewhere.

> (Another reason to keep it "off" is that I'm not sure about what
> happens with such HD flushing features on virtual servers).

I don't see how that matters? Either the host will entirely ignore
flushing, and thus the sync_file_range and the fsync won't cost much, or
fsync will be honored, in which case the pre-flushing is helpful.

> Overall, I'm not pessimistic, because I've seen I/O storms on a FreeBSD host
> and it was as bad as Linux (namely the database and even the box was offline
> for long minutes...), and if you can avoid that having to read back some
> data may be not that bad a down payment.

I don't see how that'd alleviate my fear. Sure, the latency for many
workloads will be better, but I don't how that argument says anything
about the reads? And we'll not just use this in cases it'd be
beneficial...

> The issue is largely mitigated if the data is not removed from
> shared_buffers, because the OS buffer is just a copy of already hold data.
> What I would do on such systems is to increase shared_buffers and keep
> flushing on, that is to count less on the system cache and more on postgres
> own cache.

That doesn't work that well for a bunch of reasons. For one it's
completely non-adaptive. With the OS's page cache you can rely on free
memory being used for caching *and* it be available should a query or
another program need lots of memory.

> Overall, I'm not convince that the practice of relying on the OS cache is a
> good one, given what it does with it, at least on Linux.

The alternatives aren't super realistic near-term though. Using direct
IO efficiently on the set of operating systems we support is
*hard*. It's more or less trivial to hack pg up to use direct IO for
relations/shared_buffers, but it'll perform utterly horribly in many
many cases.

To pick one thing out: Without the OS buffering writes any write will
have to wait for the disks, instead being asynchronous. That'll make
writes performed by backends a massive bottleneck.

> Now, if someone could provide a dedicated box with posix_fadvise (say
> FreeBSD, maybe others...) for testing that would allow to provide data
> instead of speculating... and then maybe to decide to change its default
> value.

Testing, as an approximation, how it turns out to work on linux would be
a good step.

Greetings,

Andres Freund

In response to

Re: checkpointer continuous flushing at 2015-08-17 11:41:38 from Andres Freund

Responses

Re: checkpointer continuous flushing at 2015-08-17 19:32:24 from Fabien COELHO

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bear Giles	2015-08-17 15:36:02	Re: what would tar file FDW look like?
Previous Message	Andrew Dunstan	2015-08-17 15:03:27	Re: what would tar file FDW look like?