Quick Links

Re: checkpointer continuous flushing

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpointer continuous flushing
Date:	2015-10-18 22:36:52
Message-ID:	20151018223652.GC28038@awork2.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2015-09-10 17:15:26 +0200, Fabien COELHO wrote:
>
> >Thanks for the hints! Two-part v12 attached fixes these.
>
> Here is a v13, which is just a rebase after 1aba62ec.

I'm working on this patch, to get it into a state I think it'd be
commitable.

In my performance testing it showed that calling PerformFileFlush() only
at segment boundaries and in CheckpointWriteDelay() can lead to rather
spikey IO - not that surprisingly. The sync in CheckpointWriteDelay() is
problematic because it only is triggered while on schedule, and not when
behind. My testing seems to show that just adding a limit of 32 buffers to
FileAsynchronousFlush() leads to markedly better results.

I wonder if mmap() && msync(MS_ASYNC) isn't a better replacement for
sync_file_range(SYNC_FILE_RANGE_WRITE) than posix_fadvise(DONTNEED). It
might even be possible to later approximate that on windows using
FlushViewOfFile().

As far as I can see the while (nb_spaces != 0)/NextBufferToWrite() logic
doesn't work correctly if tablespaces aren't actually sorted. I'm
actually inclined to fix this by simply removing the flag to
enable/disable sorting.

Having defined(HAVE_SYNC_FILE_RANGE) || defined(HAVE_POSIX_FADVISE) in
so many places looks ugly, I want to push that to the underlying
functions. If we add a different flushing approach we shouldn't have to
touch several places that don't actually really care.

I've replaced the NextBufferToWrite() logic with a binaryheap.h heap -
seems to work well, with a bit less code actually.

I'll post this after some more cleanup & testing.

I've also noticed that sleeping logic in CheckpointWriteDelay() isn't
particularly good. In high throughput workloads the 100ms sleep is too
long, leading to bursty IO behaviour. If 1k+ buffers a written out a
second 100ms is a rather long sleep. For another that we only sleep
100ms when the write rate is low makes the checkpoint finish rather
quickly - on a slow disk (say microsd) that can cause unneccesary
slowdowns for concurrent activity. ISTM we should calculate the sleep
time in a better way. The SIGHUP behaviour is also weird. Anyway, this
probably belongs on a new thread.

Greetings,

Andres Freund

In response to

Re: checkpointer continuous flushing at 2015-09-10 15:15:26 from Fabien COELHO

Responses

Re: checkpointer continuous flushing at 2015-10-19 19:14:55 from Fabien COELHO
Re: checkpointer continuous flushing at 2015-10-20 07:35:42 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Janes	2015-10-18 22:59:01	tab completion for extension versions
Previous Message	Jeff Janes	2015-10-18 21:23:24	COPY FREEZE and PD_ALL_VISIBLE