Re: Fix checkpoint skip logic on idle systems by tracking LSN progress

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: david(at)pgmasters(dot)net
Cc: michael(dot)paquier(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Date: 2016-09-30 05:00:15
Message-ID: 20160930.140015.150178454.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry, I might have torn off this thread somehow..

At Thu, 29 Sep 2016 11:26:29 -0400, David Steele <david(at)pgmasters(dot)net> wrote in <30095aea-3910-dbb7-1790-a579fb93fa5e(at)pgmasters(dot)net>
> On 9/28/16 10:32 PM, Michael Paquier wrote:
> > On Thu, Sep 29, 2016 at 7:45 AM, David Steele <david(at)pgmasters(dot)net>
> > wrote:
> >>
> >> In general I agree with the other comments that this could end up
> >> being
> >> a problem. On the other hand, since the additional locks are only
> >> taken
> >> at checkpoint or archive_timeout it may not be that big a deal.
> >
> > Yes, I did some tests on my laptop a couple of months back, that has 4
> > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase
> > contention and performing a bunch of INSERT using 4 clients on 4
> > different relations I could not catch a difference.. Autovacuum was
> > disabled to eliminate any noise. I tried checkpoint_segments at 30s to
> > see its effects, as well as larger values to see the impact with the
> > standby snapshot taken by the bgwriter. Other thoughts are welcome.
>
> I don't have any better ideas than that.

I don't see no problem in setting progressAt in XLogInsertRecord.
But I doubt GetProgressRecPtr is harmful, especially when
NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems
rather alleviates the impact. But it actually doesn't seem so
harmful up to 8. (Even though I don't like the locking in
GetProgressRecPtr..)

Currently possiblly harmful calling of GetProgressRecPtr is that
in BackgroundWriterMain. It should be called with ther interval
BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't
see the issuer of SIGUSR1 of bgwriter..

> >> +++ b/src/backend/postmaster/checkpointer.c
> >> + /* OK, it's time to switch */
> >> + elog(LOG, "Request XLog Switch");
> >>
> >> LOG level seems a bit much here, perhaps DEBUG1?
> >
> > That's from Horiguchi-san's patch, and those would be definitely
> > better as DEBUG1 by looking at it. Now and in order to keep things
> > simple I think that we had better discard this patch for now. I was
> > planning to come back to this thing anyway once we are done with the
> > first problem.
>
> I still see this:
>
> +++ b/src/backend/postmaster/checkpointer.c
> /* OK, it's time to switch */
> + elog(LOG, "Request XLog Switch");
>
> > Well for now attached are two patches, that could just be squashed
> > into one.

Mmmm. Sorry, this was for my quite private instant debug, spilt
outside.. But I don't mind to leave it with DEBUG2 if it seems
useful.

> Yes, I think that makes sense.
>
> More importantly, there is a regression. With your new patch the
> xlogs are switching on archive_timeout again even with no changes.
> The v12 worked fine.

As Michael mentioned in this or another thread, it is another
issue that he wants to solve separately. I personally doubt that
this patch (v11 and v13) can be evaluated alone without it, but
we can review this with the excessive switching problem, perhaps?

> The differences are all in 0002-hs-checkpoints-v12-2.patch and as far
> as I can see the patch does not work correctly without these changes.
> Am I missing something?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-09-30 05:05:02 Re: Fix checkpoint skip logic on idle systems by tracking LSN progress
Previous Message Michael Paquier 2016-09-30 04:47:48 Re: Tracking wait event for latches