Re: checkpointer continuous flushing

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-08-18 03:46:23
Message-ID: CAA4eK1K5yZJAQxyfz5BsUDDyTcic1UXdDegnCCLYFRLPGAsxQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 18, 2015 at 1:02 AM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

>
> Hello Andres,
>
> [...] posix_fadvise().
>>>>
>>>
>>> My current thinking is "maybe yes, maybe no":-), as it may depend on the
>>> OS
>>> implementation of posix_fadvise, so it may differ between OS.
>>>
>>
>> As long as fadvise has no 'undirty' option, I don't see how that
>> problem goes away. You're telling the OS to throw the buffer away, so
>> unless it ignores it that'll have consequences when you read the page
>> back in.
>>
>
> Yep, probably.
>
> Note that we are talking about checkpoints, which "write" buffers out
> *but* keep them nevertheless. As the buffer is kept, the OS page is a
> duplicate, and freeing it should not harm, at least immediatly.
>
>
This theory could makes sense if we can predict in some way that
the data we are flushing out of OS cache won't be needed soon.
After flush, we can only rely to an extent that data could be found in
shared_buffers if the usage_count is high, other wise it could be
replaced any moment by backend needing the buffer and there is no
free buffer. Now here one way to think is that if the usage_count is
low, then anyway it's okay to assume that this won't be needed in near
future, however I don't think relying only on usage_count for such a thing
is good idea.

To sum up, I agree that it is indeed possible that flushing with
> posix_fadvise could reduce read OS-memory hits on some systems for some
> workloads, although not on Linux, see below.
>
> So the option is best kept as "off" for now, without further data, I'm
> fine with that.
>
>
One point to think here is on what basis user can decide make
this option on, is it predictable in any way?
I think one case could be when the data set fits in shared_buffers.

In general, providing an option is a good idea if user can decide with
ease when to use that option or we can give some clear recommendation
for the same otherwise one has to recommend that test your workload
with this option and if it works then great else don't use it which might
also
be okay in some cases, but it is better to be clear.

One minor point, while glancing through the patch, I noticed that couple
of multiline comments are not written in the way which is usually used
in code (Keep the first line as empty).

+/* Status of buffers to checkpoint for a particular tablespace,

+ * used internally in BufferSync.

+ * - space: oid of the tablespace

+ * - num_to_write: number of checkpoint pages counted for this tablespace

+ * - num_written: number of pages actually written out

+/* entry structure for table space to count hashtable,

+ * used internally in BufferSync.

+ */

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-08-18 04:11:51 Re: Potential GIN vacuum bug
Previous Message Michael Paquier 2015-08-18 00:21:23 Re: [patch] psql tab completion for grant execute