Re: Fix checkpoint skip logic on idle systems by tracking LSN progress

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Date: 2016-11-12 12:01:20
Message-ID: 20161112120120.tp7dealboh3wgl4f@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:

> + * This takes also
> + * advantage to avoid 8-byte torn reads on some platforms by using the
> + * fact that each insert lock is located on the same cache line.

Something residing on the same cache line doens't provide that guarantee
on all platforms.

> + * XXX: There is still room for more improvements here, particularly
> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
> + * update the progress LSN as those relations are reset during crash
> + * recovery so enforcing buffers of such relations to be flushed for
> + * example in the case of a load only on unlogged relations is a waste
> + * of disk write.

Commit records still have to be written, everything else doesn't write
WAL. So I'm doubtful this matters much?

> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
> inserted = true;
> }
>
> + /*
> + * Update the LSN progress positions. At least one WAL insertion lock
> + * is already taken appropriately before doing that, and it is simpler
> + * to do that here when the WAL record data and type are at hand.

But we don't use the "WAL record data and type"?

> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
> + * other words any activity not referring to standby logging or segment
> + * switches. Finding the last activity position is done by scanning each
> + * WAL insertion lock by taking directly the light-weight lock associated
> + * to it.
> + */

I'd prefer not to list the specific records here - that's just
guaranteed to get out of date. Why not say something "any activity not
requiring a checkpoint to be triggered" or such?

> + * If this isn't a shutdown or forced checkpoint, and if there has been no
> + * WAL activity, skip the checkpoint. The idea here is to avoid inserting
> + * duplicate checkpoints when the system is idle. That wastes log space,
> + * and more importantly it exposes us to possible loss of both current and
> + * previous checkpoint records if the machine crashes just as we're writing
> + * the update.

Shouldn't this mention archiving and also that we also ignore some forms
of WAL activity?

> -/* Should the in-progress insertion log the origin? */
> -static bool include_origin = false;
> +/* status flags of the in-progress insertion */
> +static uint8 status_flags = 0;

that seems a bit too generic a name. 'curinsert_flags'?

> /*

> @@ -317,19 +317,23 @@ BackgroundWriterMain(void)
> {
> TimestampTz timeout = 0;
> TimestampTz now = GetCurrentTimestamp();
> + XLogRecPtr current_progress_lsn = GetProgressRecPtr();
>
> timeout = TimestampTzPlusMilliseconds(last_snapshot_ts,
> LOG_SNAPSHOT_INTERVAL_MS);
>
> /*
> - * only log if enough time has passed and some xlog record has
> - * been inserted.
> + * Only log if enough time has passed, that some WAL activity
> + * has happened since last checkpoint, and that some new WAL
> + * records have been inserted since the last time we came here.

I think that sentence needs some polish.

> */
> if (now >= timeout &&
> - last_snapshot_lsn != GetXLogInsertRecPtr())
> + GetLastCheckpointRecPtr() < current_progress_lsn &&
> + last_progress_lsn < current_progress_lsn)
> {

Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
Don't we need to do the comparisons here (and when doing the checkpoint
itself) with the REDO pointer of the last checkpoint?

> diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> index 397267c..7ecc00e 100644
> --- a/src/backend/postmaster/checkpointer.c
> +++ b/src/backend/postmaster/checkpointer.c
> @@ -164,6 +164,7 @@ static double ckpt_cached_elapsed;
>
> static pg_time_t last_checkpoint_time;
> static pg_time_t last_xlog_switch_time;
> +static XLogRecPtr last_xlog_switch_lsn = InvalidXLogRecPtr;

Hm. Is it a good idea to use a static for this? Did you consider
checkpointer restarts?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-11-12 12:31:35 Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
Previous Message Michael Paquier 2016-11-12 11:43:35 Re: pg_dump, pg_dumpall and data durability