Re: checkpoints after database start/immediate checkpoints

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpoints after database start/immediate checkpoints
Date: 2016-02-03 14:57:00
Message-ID: CA+TgmoYH=oFhnjgG7AtxHX3vJqpssJ=WP7848jv7C3xN6RjCbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2016 at 7:43 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Right now it takes checkpoint_timeout till we start a checkpoint, and
> checkpoint_timeout + checkpoint_timeout * checkpoint_completion_target
> till we complete the first checkpoint after shutdown/forced checkpoints.
>
> That means a) that such checkpoint will often be bigger/more heavyweight
> than the following ones, not what you want after a restart/create
> database/basebackup... b) reliable benchmarks measuring steady state
> have to run for even longer.
>
> There's some logic to that behaviour though: With a target of 0 it'd be
> weird to start a checkpoint directly after startup, and even otherwise
> there'll be less to write at the beginning.
>
> As an example:
> 2016-02-02 01:32:58.053 CET 21361 LOG: checkpoint complete: wrote 186041 buffers (71.0%); 1 transaction log file(s) added, 0 removed, 0 recycled; sort=0.071 s, write=9.504 s, sync=0.000 s, total=9.532 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1460260 kB, estimate=1460260 kB
> 2016-02-02 01:32:58.053 CET 21361 LOG: checkpoint starting: time
> 2016-02-02 01:33:08.061 CET 21361 LOG: checkpoint complete: wrote 127249 buffers (48.5%); 0 transaction log file(s) added, 0 removed, 89 recycled; sort=0.045 s, write=9.987 s, sync=0.000 s, total=10.007 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1187558 kB, estimate=1432989 kB
> 2016-02-02 01:33:58.216 CET 21361 LOG: checkpoint complete: wrote 124649 buffers (47.5%); 0 transaction log file(s) added, 0 removed, 69 recycled; sort=0.048 s, write=10.160 s, sync=0.000 s, total=10.176 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1135086 kB, estimate=1337776 kB
> 2016-02-02 01:33:58.216 CET 21361 LOG: checkpoint starting: time
> 2016-02-02 01:34:08.060 CET 21361 LOG: checkpoint complete: wrote 123298 buffers (47.0%); 0 transaction log file(s) added, 0 removed, 69 recycled; sort=0.049 s, write=9.838 s, sync=0.000 s, total=9.843 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1184077 kB, estimate=1322406 kB
> 2016-02-02 01:34:08.060 CET 21361 LOG: checkpoint starting: time
> 2016-02-02 01:34:18.052 CET 21361 LOG: checkpoint complete: wrote 118165 buffers (45.1%); 0 transaction log file(s) added, 0 removed, 72 recycled; sort=0.046 s, write=9.987 s, sync=0.000 s, total=9.992 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1198018 kB, estimate=1309967 kB
> 2016-02-02 01:34:18.053 CET 21361 LOG: checkpoint starting: time
> 2016-02-02 01:34:28.089 CET 21361 LOG: checkpoint complete: wrote 120814 buffers (46.1%); 0 transaction log file(s) added, 0 removed, 73 recycled; sort=0.051 s, write=10.020 s, sync=0.000 s, total=10.036 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1203691 kB, estimate=1299339 kB
> 2016-02-02 01:34:28.090 CET 21361 LOG: checkpoint starting: time
> 2016-02-02 01:34:39.182 CET 21361 LOG: checkpoint complete: wrote 110411 buffers (42.1%); 0 transaction log file(s) added, 0 removed, 74 recycled; sort=0.047 s, write=9.908 s, sync=0.000 s, total=11.092 s; sync files=0, longest=0.000 s, average=0.000 s; distance=1073612 kB, estimate=1276767 kB
>
> We wrote roughly 1/3 more wal/buffers during the first checkpoint than
> following ones (And yes, the above is with an impossibly low timeout,
> but that doesn't change anything). You can make that much more extreme
> with larget shared buffers and larger timeouts.
>
>
> I wonder if this essentially point at checkpoint_timeout being wrongly
> defined: Currently it means we'll try to finish a checkpoint
> (1-checkpoint_completion_target) * timeout before the next one - but
> perhaps it should instead be that we start checkpoint_timeout * _target
> before the next timeout? Afaics that'd work more graceful in the face of
> restarts and forced checkpoints.

There's a certain appeal to that, but at the same time it seems pretty
wonky. Right now, you can say that a checkpoint is triggered when the
amount of WAL reaches X or the amount of time reaches Y, but with the
alternative definition it's a bit harder to explain what's going on
there. Maybe we should do it anyway, but...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-02-03 15:13:47 Re: Copy-pasto in the ExecForeignDelete documentation
Previous Message Michael Paquier 2016-02-03 14:42:22 Re: extend pgbench expressions with functions