Re: should we enable log_checkpoints out of the box?

From: Jan Wieck <jan(at)wi3ck(dot)info>
To: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should we enable log_checkpoints out of the box?
Date: 2021-10-31 20:38:04
Message-ID: 4cdc6a8d-cc7f-a9b6-c5de-361be048ce72@wi3ck.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/31/21 16:16, Andres Freund wrote:
> Hi,
>
> On 2021-10-31 15:43:57 -0400, Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>> > On 2021-10-31 10:59:19 -0400, Tom Lane wrote:
>> >> No DBA would be likely to consider it as anything but log spam.
>>
>> > I don't agree at all. No postgres instance should be run without
>> > log_checkpoints enabled. Performance is poor if checkpoints are
>> > triggered by anything but time, and that can only be diagnosed if
>> > log_checkpoints is on.
>>
>> This is complete nonsense.
>
> Shrug. It's based on many years of doing or being around people doing
> postgres support escalation shifts. And it's not like log_checkpoints
> incurs meaningful overhead or causes that much log volume.

I agree with Andres 100%. Whenever called to diagnose any type of
problems this is on the usual checklist and very few customers have it
turned on. The usefulness of this information very much outweighs the
tiny amount of extra log created.

>
>
>> If we think that's a generic problem, we should be fixing the problem
>> (ie, making the checkpointer smarter);
>
> We've made it less bad (checkpoint_segments -> max_wal_size, sorting IO
> for checkpoints, forcing the OS to flush writes earlier). But it's still
> a significant issue. It's not that easy to make it better.

And we kept the default for max_wal_size at 1GB. While it is a "soft"
limit, it is the main reason why instances are running full bore with a
huge percentage of full page writes because it is way too small for
their throughput and nothing in the logs warns them about it. I can run
a certain TPC-C workload on an 8-core machine quite comfortably when
max_wal_size is configured at 100G. The exact same TPC-C configuration
will spiral the machine down if left with default max_wal_size and there
is zero hint in the logs as to why.

--
Jan Wieck

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-10-31 20:44:53 Re: logical decoding and replication of sequences
Previous Message Andres Freund 2021-10-31 20:37:48 Re: inefficient loop in StandbyReleaseLockList()