Re: Fix checkpoint skip logic on idle systems by tracking LSN progress

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Steele <david(at)pgmasters(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Date: 2016-11-14 06:09:09
Message-ID: CAB7nPqRhzS0fNHNAAtRCE+CqdOKKW+KyrAzy5O_R-7zqucGevA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 12, 2016 at 9:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:
>
>> + * This takes also
>> + * advantage to avoid 8-byte torn reads on some platforms by using the
>> + * fact that each insert lock is located on the same cache line.
>
> Something residing on the same cache line doens't provide that guarantee
> on all platforms.

OK. Let's remove it then.

>> + * XXX: There is still room for more improvements here, particularly
>> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
>> + * update the progress LSN as those relations are reset during crash
>> + * recovery so enforcing buffers of such relations to be flushed for
>> + * example in the case of a load only on unlogged relations is a waste
>> + * of disk write.
>
> Commit records still have to be written, everything else doesn't write
> WAL. So I'm doubtful this matters much?

Hm, okay. In most cases this may not matter... Let's rip that off.

>> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
>> inserted = true;
>> }
>>
>> + /*
>> + * Update the LSN progress positions. At least one WAL insertion lock
>> + * is already taken appropriately before doing that, and it is simpler
>> + * to do that here when the WAL record data and type are at hand.
>
> But we don't use the "WAL record data and type"?

Yes, at some point this patch did so...

>> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
>> + * other words any activity not referring to standby logging or segment
>> + * switches. Finding the last activity position is done by scanning each
>> + * WAL insertion lock by taking directly the light-weight lock associated
>> + * to it.
>> + */
>
> I'd prefer not to list the specific records here - that's just
> guaranteed to get out of date. Why not say something "any activity not
> requiring a checkpoint to be triggered" or such?

OK. Makes sense to minimize maintenance.

>> + * If this isn't a shutdown or forced checkpoint, and if there has been no
>> + * WAL activity, skip the checkpoint. The idea here is to avoid inserting
>> + * duplicate checkpoints when the system is idle. That wastes log space,
>> + * and more importantly it exposes us to possible loss of both current and
>> + * previous checkpoint records if the machine crashes just as we're writing
>> + * the update.
>
> Shouldn't this mention archiving and also that we also ignore some forms
> of WAL activity?

I have reworded that as:
"If this isn't a shutdown or forced checkpoint, and if there has been
no WAL activity requiring a checkpoint, skip it."

>> -/* Should the in-progress insertion log the origin? */
>> -static bool include_origin = false;
>> +/* status flags of the in-progress insertion */
>> +static uint8 status_flags = 0;
>
> that seems a bit too generic a name. 'curinsert_flags'?

OK.

>> /*
>> - * only log if enough time has passed and some xlog record has
>> - * been inserted.
>> + * Only log if enough time has passed, that some WAL activity
>> + * has happened since last checkpoint, and that some new WAL
>> + * records have been inserted since the last time we came here.
>
> I think that sentence needs some polish.

Let's do this better:
/*
- * only log if enough time has passed and some xlog record has
- * been inserted.
+ * Only log if one of the following conditions is satisfied since
+ * the last time we came here::
+ * - timeout has been reached.
+ * - WAL activity has happened since last checkpoint.
+ * - New WAL records have been inserted.
*/

>> */
>> if (now >= timeout &&
>> - last_snapshot_lsn != GetXLogInsertRecPtr())
>> + GetLastCheckpointRecPtr() < current_progress_lsn &&
>> + last_progress_lsn < current_progress_lsn)
>> {
>
> Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
> Don't we need to do the comparisons here (and when doing the checkpoint
> itself) with the REDO pointer of the last checkpoint?

Hm? The progress pointer is pointing to the lastly inserted LSN, which
is not the position of the REDO pointer, but the one of the checkpoint
record. Doing a comparison of the REDO pointer would be a moot
operation, because as the checkpoint completes, the progress LSN will
be updated as well. Or do you mean that the progress LSN should *not*
be updated for a checkpoint record? It seems to me that it should
but...

>> diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
>> index 397267c..7ecc00e 100644
>> --- a/src/backend/postmaster/checkpointer.c
>> +++ b/src/backend/postmaster/checkpointer.c
>> @@ -164,6 +164,7 @@ static double ckpt_cached_elapsed;
>>
>> static pg_time_t last_checkpoint_time;
>> static pg_time_t last_xlog_switch_time;
>> +static XLogRecPtr last_xlog_switch_lsn = InvalidXLogRecPtr;
>
> Hm. Is it a good idea to use a static for this? Did you consider
> checkpointer restarts?

Indeed, I forgot about that and the current approach is not solid. The
best way to do things then is to track the LSN position of the last
switched segment in XLogCtl..
--
Michael

Attachment Content-Type Size
hs-checkpoints-v17.patch text/x-diff 23.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-11-14 06:31:22 Re: Declarative partitioning - another take
Previous Message Tsunakawa, Takayuki 2016-11-14 06:01:23 Re: Patch: Implement failover on libpq connect level.