From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | michael(dot)paquier(at)gmail(dot)com |
Cc: | andres(at)anarazel(dot)de, david(at)pgmasters(dot)net, sfrost(at)snowman(dot)net, jd(at)commandprompt(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Fix checkpoint skip logic on idle systems by tracking LSN progress |
Date: | 2016-11-14 09:10:00 |
Message-ID: | 20161114.181000.169514477.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
It applies the master and compiled cleanly and no error by
regtest. (I didn't confirmed that the problem is still fixed but
seemingly no problem)
At Mon, 14 Nov 2016 15:09:09 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqRhzS0fNHNAAtRCE+CqdOKKW+KyrAzy5O_R-7zqucGevA(at)mail(dot)gmail(dot)com>
> On Sat, Nov 12, 2016 at 9:01 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2016-11-11 16:42:43 +0900, Michael Paquier wrote:
> >
> >> + * This takes also
> >> + * advantage to avoid 8-byte torn reads on some platforms by using the
> >> + * fact that each insert lock is located on the same cache line.
> >
> > Something residing on the same cache line doens't provide that guarantee
> > on all platforms.
>
> OK. Let's remove it then.
>
> >> + * XXX: There is still room for more improvements here, particularly
> >> + * WAL operations related to unlogged relations (INIT_FORKNUM) should not
> >> + * update the progress LSN as those relations are reset during crash
> >> + * recovery so enforcing buffers of such relations to be flushed for
> >> + * example in the case of a load only on unlogged relations is a waste
> >> + * of disk write.
> >
> > Commit records still have to be written, everything else doesn't write
> > WAL. So I'm doubtful this matters much?
>
> Hm, okay. In most cases this may not matter... Let's rip that off.
>
> >> @@ -997,6 +1022,24 @@ XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn)
> >> inserted = true;
> >> }
> >>
> >> + /*
> >> + * Update the LSN progress positions. At least one WAL insertion lock
> >> + * is already taken appropriately before doing that, and it is simpler
> >> + * to do that here when the WAL record data and type are at hand.
> >
> > But we don't use the "WAL record data and type"?
>
> Yes, at some point this patch did so...
>
> >> + * GetProgressRecPtr -- Returns the newest WAL activity position, or in
> >> + * other words any activity not referring to standby logging or segment
> >> + * switches. Finding the last activity position is done by scanning each
> >> + * WAL insertion lock by taking directly the light-weight lock associated
> >> + * to it.
> >> + */
> >
> > I'd prefer not to list the specific records here - that's just
> > guaranteed to get out of date. Why not say something "any activity not
> > requiring a checkpoint to be triggered" or such?
>
> OK. Makes sense to minimize maintenance.
>
> >> + * If this isn't a shutdown or forced checkpoint, and if there has been no
> >> + * WAL activity, skip the checkpoint. The idea here is to avoid inserting
> >> + * duplicate checkpoints when the system is idle. That wastes log space,
> >> + * and more importantly it exposes us to possible loss of both current and
> >> + * previous checkpoint records if the machine crashes just as we're writing
> >> + * the update.
> >
> > Shouldn't this mention archiving and also that we also ignore some forms
> > of WAL activity?
>
> I have reworded that as:
> "If this isn't a shutdown or forced checkpoint, and if there has been
> no WAL activity requiring a checkpoint, skip it."
>
> >> -/* Should the in-progress insertion log the origin? */
> >> -static bool include_origin = false;
> >> +/* status flags of the in-progress insertion */
> >> +static uint8 status_flags = 0;
> >
> > that seems a bit too generic a name. 'curinsert_flags'?
>
> OK.
>
> >> /*
> >> - * only log if enough time has passed and some xlog record has
> >> - * been inserted.
> >> + * Only log if enough time has passed, that some WAL activity
> >> + * has happened since last checkpoint, and that some new WAL
> >> + * records have been inserted since the last time we came here.
> >
> > I think that sentence needs some polish.
>
> Let's do this better:
> /*
> - * only log if enough time has passed and some xlog record has
> - * been inserted.
> + * Only log if one of the following conditions is satisfied since
> + * the last time we came here::
> + * - timeout has been reached.
> + * - WAL activity has happened since last checkpoint.
> + * - New WAL records have been inserted.
> */
>
> >> */
> >> if (now >= timeout &&
> >> - last_snapshot_lsn != GetXLogInsertRecPtr())
> >> + GetLastCheckpointRecPtr() < current_progress_lsn &&
> >> + last_progress_lsn < current_progress_lsn)
> >> {
> >
> > Hm. I don't think it's correct to use GetLastCheckpointRecPtr() here?
> > Don't we need to do the comparisons here (and when doing the checkpoint
> > itself) with the REDO pointer of the last checkpoint?
>
> Hm? The progress pointer is pointing to the lastly inserted LSN, which
> is not the position of the REDO pointer, but the one of the checkpoint
> record. Doing a comparison of the REDO pointer would be a moot
> operation, because as the checkpoint completes, the progress LSN will
> be updated as well. Or do you mean that the progress LSN should *not*
> be updated for a checkpoint record? It seems to me that it should
> but...
>
> >> diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
> >> index 397267c..7ecc00e 100644
> >> --- a/src/backend/postmaster/checkpointer.c
> >> +++ b/src/backend/postmaster/checkpointer.c
> >> @@ -164,6 +164,7 @@ static double ckpt_cached_elapsed;
> >>
> >> static pg_time_t last_checkpoint_time;
> >> static pg_time_t last_xlog_switch_time;
> >> +static XLogRecPtr last_xlog_switch_lsn = InvalidXLogRecPtr;
> >
> > Hm. Is it a good idea to use a static for this? Did you consider
> > checkpointer restarts?
>
> Indeed, I forgot about that and the current approach is not solid. The
> best way to do things then is to track the LSN position of the last
> switched segment in XLogCtl..
If I'm not missing something, at the worst we have a checkpoint
after a checkpointer restart that should have been supressed. Is
it worth picking it up for the complexity?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2016-11-14 09:27:31 | Re: [PATCH] Allow TAP tests to be run individually |
Previous Message | Michael Paquier | 2016-11-14 09:07:09 | Re: pg_dump, pg_dumpall and data durability |