Re: checkpointer continuous flushing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-06-21 21:32:33
Message-ID: 20150621213233.GC4797@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2015-06-20 08:57:57 +0200, Fabien COELHO wrote:
> Actually I did, because as explained in another mail the fsync time when the
> other options are activated as reported in the logs is essentially null, so
> it would not bring significant improvements on these runs,
> and also the patch changes enough things as it is.
>
> So this is an evidence-based decision.

Meh. You're testing on low concurrency.

> >> - as version 2: checkpoint buffer sorting based on a 2007 patch by
> >> Takahiro Itagaki but with a smaller and static buffer allocated once.
> >> Also, sorting is done by chunks of 131072 pages in the current version,
> >> with a guc to change this value.
> >
> >I think it's a really bad idea to do this in chunks.
>
> The small problem I see is that for a very large setting there could be
> several seconds or even minutes of sorting, which may or may not be
> desirable, so having some control on that seems a good idea.

If the sorting of the dirty blocks alone takes minutes, it'll never
finish writing that many buffers out. That's a utterly bogus argument.

> Another argument is that Tom said he wanted that:-)

I don't think he said that when we discussed this last.

> In practice the value can be set at a high value so that it is nearly always
> sorted in one go. Maybe value "0" could be made special and used to trigger
> this behavior systematically, and be the default.

You're just making things too complicated.

> >That'll mean we'll frequently uselessly cause repetitive random IO,
>
> This is not an issue if the chunks are large enough, and anyway the guc
> allows to change the behavior as desired.

I don't think this is true. If two consecutive blocks are dirty, but you
sync them in two different chunks, you *always* will cause additional
random IO. Either the drive will have to skip the write for that block,
or the os will prefetch the data. More importantly with SSDs it voids
the wear leveling advantages.
> >often interleaved. That pattern is horrible for SSDs too. We should always
> >try to do this at once, and only fail back to using less memory if we
> >couldn't allocate everything.
>
> The memory is needed anyway in order to avoid a double or significantly more
> heavy implementation for the throttling loop. It is allocated once on the
> first checkpoint. The allocation could be moved to the checkpointer
> initialization if this is a concern. The memory needed is one int per
> buffer, which is smaller than the 2007 patch.

There's a reason the 2007 patch (and my revision of it last year) did
what it did. You can't just access buffer descriptors without
locking. Besides, causing additional cacheline bouncing during the
sorting process is a bad idea.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-06-22 00:15:14 Re: Time to get rid of PQnoPasswordSupplied?
Previous Message Fabien COELHO 2015-06-21 20:37:14 Re: pgbench - allow backslash-continuations in custom scripts