Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule?
Date: 2015-12-23 14:38:08
Message-ID: CA+TgmobaOq5ERWNcNkN3Hf-Dwp+yZmz7m2Z7mXNbV0G0trOmWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 23, 2015 at 9:22 AM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>> Wait, what? On what workload does the FPW spike last only a few
>> seconds? [...]
>
> Ok. AFAICR, a relatively small part at the beginning of the checkpoint, but
> possibly more that a few seconds.

On a pgbench test, and probably many other workloads, the impact of
FPWs declines exponentially (or maybe geometrically, but I think
exponentially) as we get further into the checkpoint. The first write
is dead certain to need an FPW; after that, if access is more or less
random, the chance of needing an FPW for the next write increases in
proportion to the number of FPWs already written. As the chances of
NOT needing an FPW grow higher, the tps rate starts to increase,
initially just a bit, but then faster and faster as the percentage of
the working set that has already had an FPW grows. If the working set
is large, we're still doing FPWs pretty frequently when the next
checkpoint hits - if it's small, then it'll tail off sooner.

> My actual point is that it should be tested with different and especially
> smaller values, because 1.5 changes the overall load distribution *a lot*.
> For testing purpose I suggested that a guc would help, but the patch author
> has never been back to intervene on the thread, discuss the arguments not
> provide another patch.

Well, somebody else should be able to hack a GUC into the patch.

I think one thing that this conversation exposes is that the size of
the working set matters a lot. For example, if the workload is
pgbench, you're going to see a relatively short FPW-related spike at
scale factor 100, but at scale factor 3000 it's going to be longer and
at some larger scale factor it will be longer still. Therefore you're
probably right that 1.5 is unlikely to be optimal for everyone.

Another point (which Jan Wieck made me think of) is that the optimal
behavior here likely depends on whether xlog and data are on the same
disk controller. If they aren't, the FPW spike and background writes
may not interact as much.

>>> Another issue I raised is that the load change occurs both with xlog and
>>> time triggered checkpoints, and I'm sure it should be applied in both
>>> case.
>>
>> Is this sentence missing a "not"?
> Indeed. I think that it make sense for xlog triggered checkpoints, but less
> so with time triggered checkpoints. I may be wrong, but I think that this
> deserve careful analysis.

Hmm, off-hand I don't see why that should make any difference. No
matter what triggers the checkpoint, there is going to be a spike of
FPI activity at the beginning.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-12-23 14:52:18 Re: pgbench --latency-limit option
Previous Message Robert Haas 2015-12-23 14:28:44 Re: pgbench --latency-limit option