Re: checkpointer continuous flushing

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2016-01-13 03:17:26
Message-ID: CAA4eK1LS303x6Fq425Q9guAiyJgrD_r7PFMOb2LZK9+AT+Gg9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 12, 2016 at 7:24 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-01-12 19:17:49 +0530, Amit Kapila wrote:
> > Why can't we do it at larger intervals (relative to total amount of
writes)?
> > To explain, what I have in mind, let us assume that checkpoint interval
> > is longer (10 mins) and in the mean time all the writes are being done
> > by bgwriter
>
> But that's not the scenario with the regression here, so I'm not sure
> why you're bringing it up?
>
> And if we're flushing significant portion of the writes, how does that
> avoid the performance problem pointed out two messages upthread? Where
> sorting leads to flushing highly contended buffers together, leading to
> excessive wal flushing?
>

I think it will avoid that problem, because what I am telling is not-to-sort
the buffers before writing, rather sort the flush requests. If I remember
correctly, the initial patch of Fabien doesn't have sorting at the buffer
level, but still he is able to see the benefits in many cases.

>
> But more importantly, unless you also want to delay the writes
> themselves, leaving that many dirty buffers in the kernel page cache
> will bring back exactly the type of stalls (where the kernel flushes all
> the pending dirty data in a short amount of time) we're trying to avoid
> with the forced flushing. So doing flushes in a large patches is
> something we really fundamentally do *not* want!
>

Could it be because random I/O?

> > which it registers in shared memory so that later checkpoint
> > can perform corresponding fsync's, now when the request queue
> > becomes threshhold size (let us say 1/3rd) full, then we can perform
> > sorting and merging and issue flush hints.
>
> Which means that a significant portion of the writes won't be able to be
> collapsed, since only a random 1/3 of the buffers is sorted together.
>
>
> > Basically, I think this can lead to lesser merging of neighbouring
> > writes, but might not hurt if sync_file_range() API is cheap.
>
> The cost of writing out data doess correspond heavily with the number of
> random writes - which is what you get if you reduce the number of
> neighbouring writes.
>

Yeah, thats right, but I am not sure how much difference it would
create if sorting everything at one short versus if we do that in
batches. In anycase, I am just trying to think out loud to see if we
can find some solution to the regression you have seen above
without disabling sorting altogether for certain cases.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-01-13 06:16:59 Re: [PROPOSAL] VACUUM Progress Checker.
Previous Message Peter Eisentraut 2016-01-13 02:46:34 Re: Some bugs in psql_complete of psql