Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-02-10 07:52:28
Message-ID: 20220210075228.nvv6jswl2r5tvq6k@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 09, 2022 at 02:30:08PM -0500, Robert Haas wrote:
> On Wed, Feb 9, 2022 at 1:34 PM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Well, I think the worst case is that the checkpoint happens exactly
> > between two checkpoints, so you are checkpointing twice as often, but if
> > it happens just before or after a checkpoint, I assume the effect would
> > be minimal.
>
> I agree for the most part. I think that if checkpoints happen every 8
> minutes normally and the extra checkpoint happens 2 minutes after the
> previous checkpoint, the impact may be almost as bad as if it had
> happened right in the middle. If it happens 5 seconds after the
> previous checkpoint, it should be low impact.

But the extra checkpoints will be immediate, while on a properly configured
system it should be spread checkpoint. That will add some more overhead.

> > So, it seems we are weighing having a checkpoint happen in the middle of
> > a checkpoint interval vs writing more WAL. If the WAL traffic, without
> > CREATE DATABASE, is high, and the template database is small, writing
> > more WAL and skipping the checkpoint will be win, but if the WAL traffic
> > is small and the template database is big, the extra WAL will be a loss.
> > Is this accurate?
>
> I think that's basically correct. I would expect that the worry about
> big template database is mostly about template databases that are
> REALLY big. I think if your template database is 10GB you probably
> shouldn't be worried about this feature. 10GB of extra WAL isn't
> nothing, but if you've got reasonably capable hardware, it's not
> overloaded, and max_wal_size is big enough, it's probably not going to
> have a huge impact. Also, most of the impact will probably be on the
> CREATE DATABASE command itself, and other things running on the system
> at the same time will be impacted to a lesser degree. I think it's
> even possible that you will be happier with this feature than without,
> because you may like the idea that CREATE DATABASE itself is slow more
> than you like the idea of it making everything else on the system
> slow. On the other hand, if your template database is 1TB, the extra
> WAL is probably going to be a fairly big problem.
>
> Basically I think for most people this should be neutral or a win. For
> people with really large template databases, it's a loss. Hence the
> discussion about having a way for people who prefer the current
> behavior to keep it.

Those extra WALs will also impact backups and replication. You could have
fancy hardware, a read-mostly workload and the need to replicate over a slow
WAN, and in that case the 10GB could be much more problematic.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-02-10 08:09:48 Re: decoupling table and index vacuum
Previous Message Nitin Jadhav 2022-02-10 06:52:48 Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)