Re: Fix checkpoint skip logic on idle systems by tracking LSN progress

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(dot)paquier(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, david(at)pgmasters(dot)net, sfrost(at)snowman(dot)net, jd(at)commandprompt(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Date: 2016-11-14 09:10:00
Message-ID: 20161114.181000.169514477.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

It applies the master and compiled cleanly and no error by
regtest. (I didn't confirmed that the problem is still fixed but
seemingly no problem)

At Mon, 14 Nov 2016 15:09:09 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqRhzS0fNHNAAtRCE+CqdOKKW+KyrAzy5O_R-7zqucGevA(at)mail(dot)gmail(dot)com>
> On Sat, Nov 12, 2016 at 9:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:
> >
> >> + * This takes also
> >> + * advantage to avoid 8-byte torn reads on some platforms by using the
> >> + * fact that each insert lock is located on the same cache line.
> >
> > Something residing on the same cache line doens't provide that guarantee
> > on all platforms.
>
> OK. Let's remove it then.
>
> >> + * XXX: There is still room for more improvements here, particularly
> >> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
> >> + * update the progress LSN as those relations are reset during crash
> >> + * recovery so enforcing buffers of such relations to be flushed for
> >> + * example in the case of a load only on unlogged relations is a waste
> >> + * of disk write.
> >
> > Commit records still have to be written, everything else doesn't write
> > WAL. So I'm doubtful this matters much?
>
> Hm, okay. In most cases this may not matter... Let's rip that off.
>
> >> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
> >> inserted = true;
> >> }
> >>
> >> + /*
> >> + * Update the LSN progress positions. At least one WAL insertion lock
> >> + * is already taken appropriately before doing that, and it is simpler
> >> + * to do that here when the WAL record data and type are at hand.
> >
> > But we don't use the "WAL record data and type"?
>
> Yes, at some point this patch did so...
>
> >> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
> >> + * other words any activity not referring to standby logging or segment
> >> + * switches. Finding the last activity position is done by scanning each
> >> + * WAL insertion lock by taking directly the light-weight lock associated
> >> + * to it.
> >> + */
> >
> > I'd prefer not to list the specific records here - that's just
> > guaranteed to get out of date. Why not say something "any activity not
> > requiring a checkpoint to be triggered" or such?
>
> OK. Makes sense to minimize maintenance.
>
> >> + * If this isn't a shutdown or forced checkpoint, and if there has been no
> >> + * WAL activity, skip the checkpoint. The idea here is to avoid inserting
> >> + * duplicate checkpoints when the system is idle. That wastes log space,
> >> + * and more importantly it exposes us to possible loss of both current and
> >> + * previous checkpoint records if the machine crashes just as we're writing
> >> + * the update.
> >
> > Shouldn't this mention archiving and also that we also ignore some forms
> > of WAL activity?
>
> I have reworded that as:
> "If this isn't a shutdown or forced checkpoint, and if there has been
> no WAL activity requiring a checkpoint, skip it."
>
> >> -/* Should the in-progress insertion log the origin? */
> >> -static bool include_origin = false;
> >> +/* status flags of the in-progress insertion */
> >> +static uint8 status_flags = 0;
> >
> > that seems a bit too generic a name. 'curinsert_flags'?
>
> OK.
>
> >> /*
> >> - * only log if enough time has passed and some xlog record has
> >> - * been inserted.
> >> + * Only log if enough time has passed, that some WAL activity
> >> + * has happened since last checkpoint, and that some new WAL
> >> + * records have been inserted since the last time we came here.
> >
> > I think that sentence needs some polish.
>
> Let's do this better:
> /*
> - * only log if enough time has passed and some xlog record has
> - * been inserted.
> + * Only log if one of the following conditions is satisfied since
> + * the last time we came here::
> + * - timeout has been reached.
> + * - WAL activity has happened since last checkpoint.
> + * - New WAL records have been inserted.
> */
>
> >> */
> >> if (now >= timeout &&
> >> - last_snapshot_lsn != GetXLogInsertRecPtr())
> >> + GetLastCheckpointRecPtr() < current_progress_lsn &&
> >> + last_progress_lsn < current_progress_lsn)
> >> {
> >
> > Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
> > Don't we need to do the comparisons here (and when doing the checkpoint
> > itself) with the REDO pointer of the last checkpoint?
>
> Hm? The progress pointer is pointing to the lastly inserted LSN, which
> is not the position of the REDO pointer, but the one of the checkpoint
> record. Doing a comparison of the REDO pointer would be a moot
> operation, because as the checkpoint completes, the progress LSN will
> be updated as well. Or do you mean that the progress LSN should *not*
> be updated for a checkpoint record? It seems to me that it should
> but...
>
> >> diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> >> index 397267c..7ecc00e 100644
> >> --- a/src/backend/postmaster/checkpointer.c
> >> +++ b/src/backend/postmaster/checkpointer.c
> >> @@ -164,6 +164,7 @@ static double ckpt_cached_elapsed;
> >>
> >> static pg_time_t last_checkpoint_time;
> >> static pg_time_t last_xlog_switch_time;
> >> +static XLogRecPtr last_xlog_switch_lsn = InvalidXLogRecPtr;
> >
> > Hm. Is it a good idea to use a static for this? Did you consider
> > checkpointer restarts?
>
> Indeed, I forgot about that and the current approach is not solid. The
> best way to do things then is to track the LSN position of the last
> switched segment in XLogCtl..

If I'm not missing something, at the worst we have a checkpoint
after a checkpointer restart that should have been supressed. Is
it worth picking it up for the complexity?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-11-14 09:27:31 Re: [PATCH] Allow TAP tests to be run individually
Previous Message Michael Paquier 2016-11-14 09:07:09 Re: pg_dump, pg_dumpall and data durability