Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Date: 2016-02-10 07:11:03
Message-ID: CAA4eK1LB9HDq+F8Lw8bGRQx6Sw42XaikX1UQ2DZk+AuEGbfjWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, Feb 10, 2016 at 12:16 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com
> wrote:

>
>
> On Wed, Feb 10, 2016 at 12:41 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > On Wed, Feb 10, 2016 at 7:17 AM, Michael Paquier <
> michael(dot)paquier(at)gmail(dot)com>
> > wrote:
> >>
> >
> > Consider below code:
> >
> > /*
> > + * Get progress before acquiring insert locks to shorten the locked
> > + * section waiting ahead.
> > + */
> > + progress_lsn = GetProgressRecPtr();
> > +
> > + /*
> > * We must block concurrent insertions while examining insert state to
> > * determine the checkpoint REDO pointer.
> > */
> > WALInsertLockAcquireExclusive();
> >
> > In GetProgressRecPtr(), first it take WALInsertLocks one-by-one to
> > retrieve the latest value of progressAt and then again it will take
> > all the WALInsertLocks in WALInsertLockAcquireExclusive() and then
> > do some work. Now I think it is okay to retrieve the latest of
> progressAt
> > after WALInsertLockAcquireExclusive() as you don't need to take the
> > locks again.
>
> Sure, that would be OK. Now I am not on board if this means to have the
> checkpoint take an exclusive locks for a bit longer duration if that's not
> actually necessary.
>
>
Taking and releasing locks again and again (which is done in patch)
matters much more than adding few instructions under it and I think
if you would have written the code in-a-way as in patch in some of
the critical path, it would have been regressed badly, but because
checkpoint doesn't happen often, reproducing regression is difficult.
Anyway, there is no-point in too much argument, I think you have
understood what I wanted to say and if you think the way code is
currently written is better, then lets leave as it is for somebody else
to give suggestion on it.

>
> >> > - last_snapshot_lsn != GetXLogInsertRecPtr())
> >> > +
> >> > GetLastCheckpointRecPtr() < GetProgressRecPtr())
> >> >
> >> > How the above check in patch suppose to work?
> >> > I think it could so happen once bgwriter logs snapshot, both
> checkpoint
> >> > and progressAt won't get updated any further and I think this check
> will
> >> > allow to log snapshots in such a case as well.
> >>
> >> The only purpose of this check is to do the following thing: if no WAL
> >> activity has happened since last checkpoint, there is no need to log a
> >> standby snapshot from the perspective of the bgwriter. In case the
> >> system is idle, we want to skip logging that and avoid unnecessary
> >> checkpoints because those records would have been generated. If the
> >> system is not idle, or to put it in other words there has been at
> >> least one record since the last checkpoint, we would log a standby
> >> snapshot, enforcing as well a checkpoint to happen the next time
> >> CreateCheckpoint() is gone through, and a standby snapshot is logged
> >> as well after the checkpoint contents are flushed. I am not sure I
> >> understand what you are getting at...
> >
> > Let me try to say again, suppose ControlFile->checkPoint is at
> > 100 and latest value of progressAt returned by GetProgressRecPtr
> > is 105, so the first time the above check happens, it will allow
> > to log standby snapshot which is okay, now assume there is no
> > activity, neither there is any checkpoint and again after
> > LOG_SNAPSHOT_INTERVAL_MS interval, when the above check
> > gets executed, it will pass and log the standby snapshot which is
> > *not* okay, because we don't want to log standby snapshot when
> > there is no activity. Am I missing something?
>
> Ah, right. Sorry for not getting your point. OK now I got it. So basically
> what the code does not is that if there is just one record after a
> checkpoint we would log every 15s a standby snapshot until the next
> checkpoint even if nothing happens, but we can save a bunch of standby
> records. So your point is that we need to save the result of
> GetProgressRecPtr() at each loop in the bgwriter when a standby snapshot is
> taken, say in a static XLogRecPtr called last_progress_lsn. And then at the
> next loop we compare it on the current result of GetProgressRecPtr(), so
> the condition to check if a standby snapshot should be generated or not in
> the bgwriter becomes that:
> (now >= timeout &&
> GetLastCheckpointRecPtr() < current_progress_ptr &&
> last_progress_ptr < current_progress_ptr)
>

Why do you think including checkpoint related check is
required, how about something like below:

(now >= timeout &&
last_progress_ptr < current_progress_ptr)

> That's your point?
>

Yes.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2016-02-10 07:31:34 Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Previous Message Michael Paquier 2016-02-10 06:46:39 Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-02-10 07:31:34 Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Previous Message Noah Misch 2016-02-10 06:58:44 Re: Tracing down buildfarm "postmaster does not shut down" failures