Quick Links

Re: Fix checkpoint skip logic on idle systems by tracking LSN progress

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org, andres(at)anarazel(dot)de
Subject:	Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Date:	2016-09-27 10:16:12
Message-ID:	20160927.191612.123888421.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I apologize in advance that the comments in this message might
one of the ideas discarded in the past thread.. I might not grasp
the discussion completely X(

The attached patches are rebased to the master and additional one
mentioned below.

At Wed, 18 May 2016 17:57:49 -0400, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA(at)mail(dot)gmail(dot)com>

> A couple of months back is has been reported to pgsql-bugs that WAL
> segments were always switched with a low value of archive_timeout even
> if a system is completely idle:
> http://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.org
> In short, a closer look at the problem has showed up that the logic in
> charge of checking if a checkpoint should be skipped or not is
> currently broken, because it completely ignores standby snapshots in
> its calculation of the WAL activity. So a checkpoint always occurs
> after checkpoint_timeout on an idle system since hot_standby has been
> introduced as wal_level. This did not get better from 9.4, since
> standby snapshots are logged every 15s by the background writer
> process. In 9.6, since wal_level = 'archive' and 'hot_standby'
> actually has the same meaning, the skip logic that worked with
> wal_level = 'archive' does not do its job anymore.
>
> One solution that has been discussed is to track the progress of WAL
> activity when doing record insertion by being able to mark some
> records as not updating the progress of WAL. Standby snapshot records
> enter in this category, making the checkpoint skip logic more robust.
> Attached is a patch implementing a solution for it, by adding in

If I understand the old thread correctly, this still doesn't
solve the main issue, excessive checkpoint and segment
switching. The reason is the interaction between XLOG_SWITCH and
checkpoint as mentioned there. I think we may treat XLOG_SWITCH
as NO_PROGRESS, since we can recover to the lastest state without
it. If it is not wrong, the second patch attached (v12-2) inserts
XLOG_SWITCH as NO_PROGRESS and skips segment switching when no
progress took place for the round.

> WALInsertLock a new field that gets updated for each record to track
> the LSN progress. This allows to reliably skip the generation of
> standby snapshots in the bgwriter or checkpoints on an idle system.

WALInsertLock doesn't seem to me to be a good place for
progressAt even considering the discussion about adding few
instructions (containing a branch) in the
hot-section. BackgroundWriterMain blocks all running
XLogInsertRecord every 200 ms, not 15 or 30 seconds (only for
replica, though). If this is correct, the Amit's suggestion below
will have more significance, and I rather agree with it. XLogCtl
is more suitable place for progressAt for the case.

https://www.postgresql.org/message-id/CAA4eK1LB9HDq+F8Lw8bGRQx6Sw42XaikX1UQ2DZk+AuEGbfjWA@mail.gmail.com
Amit> Taking and releasing locks again and again (which is done in patch)
Amit> matters much more than adding few instructions under it and I think
Amit> if you would have written the code in-a-way as in patch in some of
Amit> the critical path, it would have been regressed badly, but because
Amit> checkpoint doesn't happen often, reproducing regression is difficult.

https://www.postgresql.org/message-id/CAB7nPqSO6HVJ0T6LUT84PCy+_ihitdt64Ze2D+SJrHZy72Y0wg@mail.gmail.com
> > Also, I think it is worth to once take the performance data for
> > write tests (using pgbench 30 minute run or some other way) with
> > minimum checkpoint_timeout (i.e 30s) to see if the additional locking
> > has any impact on performance. I think taking locks at intervals
> > of 15s or 30s should not matter much, but it is better to be safe.
>
> I don't think performance will be impacted, but there are no reasons
> to not do any measurements either. I'll try to get some graphs
> tomorrow with runs on my laptop, mainly looking for any effects of
> this patch on TPS when checkpoints show up.

I don't think the impact is measurable on a laptop, where 2 to 4
cores available at most.

> Per discussion with Andres at PGcon, we decided that this is an
> optimization, only for 9.7~ because this has been broken for a long
> time. I have also changed XLogIncludeOrigin() to use a more generic
> routine to set of status flags for a record being inserted:
> XLogSetFlags(). This routine can use two flags:
> - INCLUDE_ORIGIN to decide if the origin should be logged or not
> - NO_PROGRESS to decide at insertion if a record should update the LSN
> progress or not.
> Andres mentioned me that we'd want to have something similar to
> XLogIncludeOrigin, but while hacking I noticed that grouping both
> things under the same umbrella made more sense.

This looks reasonable.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-hs-checkpoints-v12-1.patch	text/x-patch	18.7 KB
0002-hs-checkpoints-v12-2.patch	text/x-patch	2.8 KB
0003-hs-checkpoints-v12-3.patch	text/x-patch	2.1 KB

In response to

Fix checkpoint skip logic on idle systems by tracking LSN progress at 2016-05-18 21:57:49 from Michael Paquier

Responses

Re: Fix checkpoint skip logic on idle systems by tracking LSN progress at 2016-09-27 21:12:38 from David Steele
Re: Fix checkpoint skip logic on idle systems by tracking LSN progress at 2016-09-28 07:32:40 from Michael Paquier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dilip Kumar	2016-09-27 10:39:56	Re: Speed up Clog Access by increasing CLOG buffers
Previous Message	Masahiko Sawada	2016-09-27 09:24:18	Re: Transactions involving multiple postgres foreign servers