Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-10-19 19:14:55
Message-ID: alpine.DEB.2.10.1510190834250.30391@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

>> Here is a v13, which is just a rebase after 1aba62ec.
>
> I'm working on this patch, to get it into a state I think it'd be
> commitable.

I'll review it carefully. Also, if you can include some performance
feature it would help, even if I'll do some more runs.

> In my performance testing it showed that calling PerformFileFlush() only
> at segment boundaries and in CheckpointWriteDelay() can lead to rather
> spikey IO - not that surprisingly. The sync in CheckpointWriteDelay() is
> problematic because it only is triggered while on schedule, and not when
> behind.

When behind, the PerformFileFlush should be called on segment boundaries.
The idea was not to go to sleep without flushing, and to do it as little
as possible.

> My testing seems to show that just adding a limit of 32 buffers to
> FileAsynchronousFlush() leads to markedly better results.

Hmmm. 32 buffers means 256 KB, which is quite small. Not sure what a good
"limit" would be. It could depend whether pages are close or not.

> I wonder if mmap() && msync(MS_ASYNC) isn't a better replacement for
> sync_file_range(SYNC_FILE_RANGE_WRITE) than posix_fadvise(DONTNEED). It
> might even be possible to later approximate that on windows using
> FlushViewOfFile().

I'm not sure that mmap/msync can be used for this purpose, because there
is no real control it seems about where the file is mmapped.

> As far as I can see the while (nb_spaces != 0)/NextBufferToWrite() logic
> doesn't work correctly if tablespaces aren't actually sorted. I'm
> actually inclined to fix this by simply removing the flag to
> enable/disable sorting.

I do no think that there is a significant downside to having sort always
on, but showing it requires to be able to test, so to have a guc. The
point of the guc is to demonstrate that the feature is harmless:-)

> Having defined(HAVE_SYNC_FILE_RANGE) || defined(HAVE_POSIX_FADVISE) in
> so many places looks ugly, I want to push that to the underlying
> functions. If we add a different flushing approach we shouldn't have to
> touch several places that don't actually really care.

I agree that it is pretty ugly, but I do not think that you can remove
them all. You need at least one for checking the guc and one for enabling
the feature. Maybe their number could be reduced if the functions are
switched to do-nothing stubs which are called nevertheless, but I was not
keen on letting unused code when there is no sync_file_range nor
posix_fadvise.

> I've replaced the NextBufferToWrite() logic with a binaryheap.h heap -
> seems to work well, with a bit less code actually.

Hmmm. I'll check. I'm still unconvinced that using a tree for a 2-3
element set in most case is an improvement.

> I'll post this after some more cleanup & testing.

I'll have a look when it is ready.

> I've also noticed that sleeping logic in CheckpointWriteDelay() isn't
> particularly good. In high throughput workloads the 100ms sleep is too
> long, leading to bursty IO behaviour. If 1k+ buffers a written out a
> second 100ms is a rather long sleep. For another that we only sleep
> 100ms when the write rate is low makes the checkpoint finish rather
> quickly - on a slow disk (say microsd) that can cause unneccesary
> slowdowns for concurrent activity. ISTM we should calculate the sleep
> time in a better way.

I also noted this point, but I'm not sure how to have a better approach,
so I let it as it is. I tried 50 ms & 200 ms on some runs, without
significant effect on performance for the test I ran then. The point of
having not too small a value is that it provide some significant work to
the IO subsystem without overflowing it. On average it does not matter.
I'm unsure how it would interact with flushing. So I decided not to do
anything about it. Maybe it should be a guc, but I would not know how to
choose it.

> The SIGHUP behaviour is also weird. Anyway, this probably belongs on a
> new thread.

Probably. I did not try to look at that.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-10-19 19:27:20 Re: Checkpoint throttling issues
Previous Message Pavel Stehule 2015-10-19 18:14:34 Re: [PATCH] SQL function to report log message