Re: Redesigning checkpoint_segments

From: Venkata Balaji N <nag1010(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Redesigning checkpoint_segments
Date: 2015-01-30 03:48:39
Message-ID: CAEyp7J-BeBcn4dO9judZmxcJrp8Fj23nuF-nuitpd5MTqu1zfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I really like the idea of tuning checkpoint segments based on disk space
usage.

I performed series of tests for this patch and would like to share the
results. My comments are in-line.

To start with, I applied this patch to the master successfully -

> But ... do I understand things correctly that checkpoint wouldn't "kick
>> in" until you hit checkpoint_wal_size? If that's the case, isn't real
>> disk space usage around 2X checkpoint_wal_size if spread checkpoint is
>> set to 0.9? Or does checkpoint kick in sometime earlier?
>>
>
> It kicks in earlier, so that the checkpoint *completes* just when
> checkpoint_wal_size of WAL is used up. So the real disk usage is
> checkpoint_wal_size.
>
> There is still an internal variable called CheckPointSegments that
> triggers the checkpoint, but it is now derived from checkpoint_wal_size
> (see CalculateCheckpointSegments function):
>
> CheckPointSegments = (checkpoint_wal_size / 16 MB) / (2 +
> checkpoint_completion_target)
>

Yes, i see this happening.

This is the same formula we've always had in the manual for calculating the
> amount of WAL space used, but in reverse. I.e. we calculate
> CheckPointSegments based on the desired disk space usage, not the other way
> round.
>
> As a note, pgBench would be a terrible test for this patch; we really
>> need something which creates uneven traffic. I'll see if I can devise
>> something.
>>
>
> Attached is a rebased version of this patch. Everyone, please try this out
> on whatever workloads you have, and let me know:
>
> a) How does the auto-tuning of the number of recycled segments work? Does
> pg_xlog reach a steady-state size, or does it fluctuate a lot?
>

I performed the tests by executing heavy INSERT operations (INSERTS only)
using benchmarksql. I do see that pg_xlog size is increasing at times.

I have inserted about 6GB of data for testing.

Below are the test results.

*Test 1 :*

In this test, i see removed+recycled segments = 3 (except for the first 3
checkpoint cycles) and has been steady through out until the INSERT
operation completed.

Actual calculation of CheckPointSegments = 3.2 ( is getting rounded up to 3
)

pg_xlog size is 128M and has increased to 160M max during the INSERT
operation.

shared_buffers = 128M
checkpoint_wal_size = 128M
min_recycle_wal_size = 80M
checkpoint_timeout = 5min

TimeStamp=2015-01-27 09:39:14.325 GMT-10 DB=bsql SID=54c6cfd6.5e4
User=postgres LOG: statement: update order_line set ol_amount = 0.01;
TimeStamp=2015-01-27 09:39:15.407 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint complete: wrote 5123 buffers (31.3%); 0 transaction log file(s)
added, 1 removed, 0 recycled; write=0.593 s, sync=2.492 s, total=3.273 s;
sync files=26, longest=0.399 s, average=0.095 s; distance=52653 KB,
estimate=52653 KB
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoints are occurring too frequently (3 seconds apart)
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= HINT:
Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:18.680 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint complete: wrote 8145 buffers (49.7%); 0 transaction log file(s)
added, 3 removed, 0 recycled; write=0.913 s, sync=1.476 s, total=2.530 s;
sync files=4, longest=0.534 s, average=0.369 s; distance=87446 KB,
estimate=87446 KB
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoints are occurring too frequently (3 seconds apart)
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= HINT:
Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:21.211 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:23.169 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint complete: wrote 4598 buffers (28.1%); 0 transaction log file(s)
added, 3 removed, 2 recycled; write=0.716 s, sync=1.083 s, total=1.957 s;
sync files=4, longest=0.486 s, average=0.270 s; distance=47964 KB,
estimate=83498 KB
TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoints are occurring too frequently (2 seconds apart)
TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= HINT:
Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:23.235 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:24.968 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint complete: wrote 3417 buffers (20.9%); 0 transaction log file(s)
added, 1 removed, 2 recycled; write=0.539 s, sync=1.059 s, total=1.732 s;
sync files=4, longest=0.535 s, average=0.264 s; distance=44814 KB,
estimate=79629 KB
TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoints are occurring too frequently (2 seconds apart)
TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= HINT:
Consider increasing the configuration parameter "checkpoint_wal_size".
TimeStamp=2015-01-27 09:39:25.118 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 09:39:26.879 GMT-10 DB= SID=54bee4a1.3002 User= LOG:
checkpoint complete: wrote 4721 buffers (28.8%); 0 transaction log file(s)
added, 1 removed, 2 recycled; write=0.474 s, sync=1.166 s, total=1.761 s;
sync files=4, longest=0.583 s, average=0.291 s; distance=49145 KB,
estimate=76581 KB

*Test 2 :*

removed+recycled segments remained 3 even after i increased the
checkpoint_wal_size = 144M. This is obviously due to the calculation in
CalculateCheckpointSegments() functions.

checkpoint_wal_size = 144M
min_recycle_wal_size = 104M
checkpoint_timeout = 5min
shared_buffers = 1 GB

Actual calculation of CheckPointSegments = 3.6

TimeStamp=2015-01-27 13:54:38.469 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 13:54:42.831 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint complete: wrote 5419 buffers (4.1%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=2.408 s, sync=1.820 s, total=4.361 s;
sync files=3, longest=1.432 s, average=0.606 s; distance=48175 KB,
estimate=49972 KB
TimeStamp=2015-01-27 13:54:44.824 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 13:54:49.008 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint complete: wrote 5570 buffers (4.2%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=2.769 s, sync=1.268 s, total=4.184 s;
sync files=3, longest=0.843 s, average=0.422 s; distance=51720 KB,
estimate=51720 KB
TimeStamp=2015-01-27 13:54:50.754 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 13:54:55.127 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint complete: wrote 5155 buffers (3.9%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=2.977 s, sync=1.273 s, total=4.372 s;
sync files=3, longest=0.848 s, average=0.424 s; distance=46133 KB,
estimate=51161 KB
TimeStamp=2015-01-27 13:54:57.164 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 13:55:01.622 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint complete: wrote 5345 buffers (4.1%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=2.598 s, sync=1.290 s, total=4.458 s;
sync files=3, longest=0.894 s, average=0.430 s; distance=49604 KB,
estimate=51006 KB
TimeStamp=2015-01-27 13:55:03.501 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 13:55:07.390 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint complete: wrote 5482 buffers (4.2%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=2.549 s, sync=1.193 s, total=3.889 s;
sync files=3, longest=0.837 s, average=0.397 s; distance=49963 KB,
estimate=50901 KB
TimeStamp=2015-01-27 13:55:09.381 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 13:55:13.626 GMT-10 DB= SID=54c70b57.21a0 User= LOG:
checkpoint complete: wrote 5481 buffers (4.2%); 0 transaction log file(s)
added, 0 removed, 3 recycled; write=2.778 s, sync=1.280 s, total=4.244 s;
sync

*Test 3 :*
checkpoint_wal_size = 244M
min_recycle_wal_size = 204M
checkpoint_timeout = 5min
shared_buffers = 1 GB

removed+recycled segments remained 6.

Actual calculation of checkpointsegments = 6.1

TimeStamp=2015-01-27 14:02:01.936 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:10.638 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint complete: wrote 14111 buffers (10.8%); 0 transaction log file(s)
added, 1 removed, 5 recycled; write=5.527 s, sync=2.719 s, total=8.701 s;
sync files=14, longest=1.789 s, average=0.194 s; distance=98617 KB,
estimate=99036 KB
TimeStamp=2015-01-27 14:02:14.243 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:22.783 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint complete: wrote 16524 buffers (12.6%); 0 transaction log file(s)
added, 1 removed, 5 recycled; write=7.013 s, sync=1.394 s, total=8.540 s;
sync files=3, longest=0.867 s, average=0.464 s; distance=98724 KB,
estimate=99005 KB
TimeStamp=2015-01-27 14:02:28.066 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:36.946 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint complete: wrote 16541 buffers (12.6%); 0 transaction log file(s)
added, 1 removed, 5 recycled; write=4.899 s, sync=3.801 s, total=8.879 s;
sync files=9, longest=2.800 s, average=0.422 s; distance=98719 KB,
estimate=98976 KB
TimeStamp=2015-01-27 14:02:40.611 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 14:02:48.066 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint complete: wrote 10998 buffers (8.4%); 0 transaction log file(s)
added, 1 removed, 5 recycled; write=4.874 s, sync=1.998 s, total=7.455 s;
sync files=3, longest=1.998 s, average=0.666 s; distance=98771 KB,
estimate=98956 KB
TimeStamp=2015-01-27 14:02:53.327 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-27 14:03:00.872 GMT-10 DB= SID=54c70d58.22f4 User= LOG:
checkpoint complete: wrote 10640 buffers (8.1%); 0 transaction log file(s)
added, 1 removed, 5 recycled; write=5.247 s, sync=2.097 s, total=7.544 s;
sync files=3, longest=1.640 s, average=0.699 s; distance=98624 KB,
estimate=98923 KB

*Test 4 :*

This time i tested with wal_keep_segments = 300 (4.8 G)

checkpoint_wal_size = 512MB
min_recycle_wal_size = 80M
wal_keep_segments = 300
checkpoint_timeout = 5min
shared_buffers = 1 GB

Actual calculation of checkpointsegments = 12.8

TimeStamp=2015-01-29 12:51:48.276 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 12:52:04.325 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 20965 buffers (16.0%); 0 transaction log file(s)
added, 0 removed, 0 recycled; write=11.676 s, sync=3.830 s, total=16.049 s;
sync files=18, longest=2.991 s, average=0.212 s; distance=196705 KB,
estimate=196705 KB
TimeStamp=2015-01-29 12:52:16.068 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 12:52:33.529 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 22009 buffers (16.8%); 1 transaction log file(s)
added, 0 removed, 0 recycled; write=12.705 s, sync=3.559 s, total=17.460 s;
sync files=3, longest=3.002 s, average=1.186 s; distance=200401 KB,
estimate=200401 KB
TimeStamp=2015-01-29 12:52:43.321 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog

Since the wal_keep_segments is 300, recycling or removing of the
transactions logs begins after the required number of wal_keep_segments are
retained. Which is 4.8G in this case.

removed+recycled has always been 12 except for the first 3 checkpoint
cycles after pg_xlog size reached 4.8G.

TimeStamp=2015-01-29 13:03:29.167 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 13:03:58.401 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 20316 buffers (15.5%); 0 transaction log file(s)
added, 0 removed, 0 recycled; write=11.963 s, sync=16.840 s, total=29.233
s; sync files=16, longest=15.137 s, average=1.052 s; distance=197432 KB,
estimate=197432 KB
TimeStamp=2015-01-29 13:04:05.451 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 13:04:52.416 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 20280 buffers (15.5%); 0 transaction log file(s)
added, 5 removed, 8 recycled; write=10.989 s, sync=35.791 s, total=46.965
s; sync files=10, longest=17.927 s, average=3.579 s; distance=196668 KB,
estimate=197356 KB
TimeStamp=2015-01-29 13:04:52.635 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 13:05:15.520 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 31394 buffers (24.0%); 0 transaction log file(s)
added, 0 removed, 10 recycled; write=10.270 s, sync=12.404 s, total=22.884
s; sync files=17, longest=5.014 s, average=0.729 s; distance=197961 KB,
estimate=197961 KB
TimeStamp=2015-01-29 13:05:20.356 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 13:05:35.060 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 32731 buffers (25.0%); 0 transaction log file(s)
added, 0 removed, 10 recycled; write=11.433 s, sync=3.055 s, total=14.703
s; sync files=13, longest=1.300 s, average=0.235 s; distance=196510 KB,
estimate=197816 KB
TimeStamp=2015-01-29 13:05:43.059 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 13:05:59.518 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 30264 buffers (23.1%); 0 transaction log file(s)
added, 0 removed, 12 recycled; write=10.687 s, sync=5.624 s, total=16.459
s; sync files=12, longest=3.971 s, average=0.468 s; distance=193348 KB,
estimate=197369 KB
TimeStamp=2015-01-29 13:06:07.371 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint starting: xlog
TimeStamp=2015-01-29 13:06:23.870 GMT-10 DB= SID=54c99ff1.5bc9 User= LOG:
checkpoint complete: wrote 30723 buffers (23.4%); 0 transaction log file(s)
added, 0 removed, 12 recycled; write=10.132 s, sync=6.159 s, total=16.498
s; sync file

> b) Are the two GUCs, checkpoint_wal_size, and min_recycle_wal_size,
> intuitive to set?
>
>
During my tests, I did not observe the significance of min_recycle_wal_size
parameter yet. Ofcourse, i had sufficient disk space for pg_xlog.

I would like to understand more about "min_recycle_wal_size" parameter. In
theory, i only understand from the note in the patch that if the disk space
usage falls below certain threshold, min_recycle_wal_size number of WALs
will be removed to accommodate future pg_xlog segments. I will try to test
this out. Please let me know if there is any specific test to understand
min_recycle_wal_size behaviour.

I will try to perform some more stress testing with different set of high
workloads and will share the results.

I did not review the patch code completely. Will comment once done.

Please share your thoughts on this.

Regards,
Venkata B N

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-01-30 04:07:09 Re: Safe memory allocation functions
Previous Message Stephen Frost 2015-01-30 03:40:30 Re: Possible typo in create_policy.sgml