Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2015-06-24 06:29:04
Message-ID: alpine.DEB.2.10.1506240628160.3535@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> Besides, causing additional cacheline bouncing during the
>> sorting process is a bad idea.
>
> Hmmm. The impact would be to multiply the memory required by 3 or 4 (buf_id,
> relation, forknum, offset), instead of just buf_id, and I understood that
> memory was a concern.
>
> Moreover, once the sort process get the lines which contain the sorting data
> from the buffer descriptor in its cache, I think that it should be pretty
> much okay. Incidentally, they would probably have been brought to cache by
> the scan to collect them. Also, I do not think that the sorting time for
> 128000 buffers, and possible cache misses, was a big issue, but I do not have
> a measure to defend that. I could try to collect some data about that.

I've collected some data by adding a "sort time" measure, with
checkpoint_sort_size=10000000 so that sorting is in one chunk, and done
some large checkpoints:

LOG: checkpoint complete: wrote 41091 buffers (6.3%);
0 transaction log file(s) added, 0 removed, 0 recycled;
sort=0.024 s, write=0.488 s, sync=8.790 s, total=9.837 s;
sync files=41, longest=8.717 s, average=0.214 s;
distance=404972 kB, estimate=404972 kB

LOG: checkpoint complete: wrote 212124 buffers (32.4%);
0 transaction log file(s) added, 0 removed, 0 recycled;
sort=0.078 s, write=128.885 s, sync=1.269 s, total=131.646 s;
sync files=43, longest=1.155 s, average=0.029 s;
distance=2102950 kB, estimate=2102950 kB

LOG: checkpoint complete: wrote 384427 buffers (36.7%);
0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.120 s, write=83.995 s, sync=13.944 s, total=98.035 s;
sync files=9, longest=13.724 s, average=1.549 s;
distance=3783305 kB, estimate=3783305 kB

LOG: checkpoint complete: wrote 809211 buffers (77.2%);
0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.358 s, write=138.146 s, sync=14.943 s, total=153.124 s;
sync files=13, longest=14.871 s, average=1.149 s;
distance=8075338 kB, estimate=8075338 kB

Summary of these checkpoints:

#buffers size sort
41091 328MB 0.024
212124 1.7GB 0.078
384427 2.9GB 0.120
809211 6.2GB 0.358

Sort times are pretty negligeable compared to the whole checkpoint time,
and under 0.1 s/GB of buffers sorted.

On a 512 GB server with shared_buffers=128GB (25%), this suggest a worst
case checkpoint sorting in a few seconds, and then you have a hundred GB
to write anyway. If we project on next decade 1 TB checkpoint that would
make sorting in under a minute... But then you have 1 TB of data to dump.

As a comparison point, I've done the large checkpoint with the default
sort size of 131072:

LOG: checkpoint complete: wrote 809211 buffers (77.2%);
0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.251 s, write=152.377 s, sync=15.062 s, total=167.453 s;
sync files=13, longest=14.974 s, average=1.158 s;
distance=8075338 kB, estimate=8075338 kB

The 0.251 sort time is to be compared to 0.358. Well, n.log(n) is not too
bad, as expected.

These figures suggest that sorting time and associated cache misses are
not a significant issue and thus are not worth bothering much about, and
also that probably a simple boolean option would be quite acceptable
instead of the chunk approach.

Attached is an updated version of the patch which turns the sort option
into a boolean, and also include the sort time in the checkpoint log.

There is still an open question about whether the sorting buffer
allocation is lost on some signals and should be reallocated in such
event.

--
Fabien.

Attachment Content-Type Size
checkpoint-continuous-flush-4.patch text/x-diff 42.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-06-24 06:43:35 Re: pg_rewind failure by file deletion in source server
Previous Message Fabien COELHO 2015-06-24 04:26:04 Re: checkpointer continuous flushing