Re: checkpointer continuous flushing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2016-01-12 13:54:55
Message-ID: 20160112135455.4tlctqzzi7g3ugub@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-01-12 19:17:49 +0530, Amit Kapila wrote:
> Why can't we do it at larger intervals (relative to total amount of writes)?
> To explain, what I have in mind, let us assume that checkpoint interval
> is longer (10 mins) and in the mean time all the writes are being done
> by bgwriter

But that's not the scenario with the regression here, so I'm not sure
why you're bringing it up?

And if we're flushing significant portion of the writes, how does that
avoid the performance problem pointed out two messages upthread? Where
sorting leads to flushing highly contended buffers together, leading to
excessive wal flushing?

But more importantly, unless you also want to delay the writes
themselves, leaving that many dirty buffers in the kernel page cache
will bring back exactly the type of stalls (where the kernel flushes all
the pending dirty data in a short amount of time) we're trying to avoid
with the forced flushing. So doing flushes in a large patches is
something we really fundamentally do *not* want!

> which it registers in shared memory so that later checkpoint
> can perform corresponding fsync's, now when the request queue
> becomes threshhold size (let us say 1/3rd) full, then we can perform
> sorting and merging and issue flush hints.

Which means that a significant portion of the writes won't be able to be
collapsed, since only a random 1/3 of the buffers is sorted together.

> Basically, I think this can lead to lesser merging of neighbouring
> writes, but might not hurt if sync_file_range() API is cheap.

The cost of writing out data doess correspond heavily with the number of
random writes - which is what you get if you reduce the number of
neighbouring writes.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2016-01-12 14:08:14 Re: Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102
Previous Message Andres Freund 2016-01-12 13:48:21 Re: checkpointer continuous flushing