Re: A few new options for CHECKPOINT

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A few new options for CHECKPOINT
Date: 2020-11-28 17:49:46
Message-ID: 20201128174946.GR16415@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Bossart, Nathan (bossartn(at)amazon(dot)com) wrote:
> On 11/27/20, 10:58 AM, "Stephen Frost" <sfrost(at)snowman(dot)net> wrote:
> > If you'd like to show that I'm wrong, and it's entirely possible that I
> > am, then retry the above with actual load on the system, and also
> > actually look at how much outstanding WAL you end up with given the
> > different scenarios which has to be replayed during crash recovery.
>
> I did a little experiment to show the behavior I'm referring to. I

I'm rather confused why you didn't use your patch to show the actual
behavior that you'll get as a result of this change..? That seems like
what would be meaningful here. I appreciate that the analysis you did
might correlate but I don't really get why we'd try to use a proxy for
this.

> used these settings:
>
> checkpoint_completion_target = 0.9
> checkpoint_timeout = 30s
> max_wal_size = 20GB
> WAL segment size is 64MB

That's an exceedingly short and very uncommon checkpoint timeout in my
experience.. If anything, folks increase checkpoint timeout from the
default (in order to reduce WAL traffic and because they have a replica
they can flip to in the event of a crash, avoiding having to go through
WAL replay on recovery) and so I'm not really sure that it's a sensible
thing to look at? Even so though...

> Recovery took 14 seconds and replayed ~3.2 GB of WAL.

[ ... ]

> Recovery took 7 seconds and replayed ~1.5 GB of WAL.

This is showing more-or-less what I expected: there's still a large
amount of outstanding WAL, even if you use a very low and unusual
timeout and attempt to time it perfectly. A question that is still not
clear is what happens when you actually do an immediate checkpoint-
there would likely still be some outstanding WAL even in that case but
I'd expect it to be a whole lot less, which is why that comment exists
in the documentation in the first place.

> Granted, I used a rather aggressive checkpoint_timeout, but I think
> this demonstrates that waiting for a non-immediate checkpoint to
> complete can lower the amount of WAL needed for recovery, even though
> it might not lower it as much as waiting for an immediate checkpoint
> would.

The difference here feels like order of magnitudes to me, between an
immediate checkpoint and a non-immediate one, vs. a much smaller
difference as you've shown here (though, still, kill'ing the postmaster
isn't exactly the same as what your patch would be doing, so I don't
really like using this particular analysis to answer this question...).

If the use-case here was just that you wanted to add more options to the
CHECKPOINT command because we have them internally and maybe they'd be
useful to expose then these things probably wouldn't matter so much, but
to imply that this is really going to cut down on the amount of WAL
replay required a lot isn't really coming through even with these
results.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-28 19:43:03 Re: What to do about the broken btree_gist for inet/cidr?
Previous Message Alvaro Herrera 2020-11-28 17:36:26 Re: Add Information during standby recovery conflicts