Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2022-02-09 18:34:21
Message-ID: YgQJLbhCeCpZeN4f@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 9, 2022 at 11:00:06AM -0500, Robert Haas wrote:
> Try running pgbench with the --progress option and enough concurrent
> jobs to keep a moderately large system busy and watching what happens
> to the tps each time a checkpoint occurs. It's extremely dramatic, or
> at least it was the last time I ran such tests. I think that
> performance will sometimes drop by a factor of five or more when the
> checkpoint hits, and take multiple minutes to recover.
>
> I think your statement that doing an extra checkpoint "just means the
> next checkpoint will do less work" is kind of misleading. That's
> certainly true in some situations. But when the same pages are being
> dirtied over and over again, an extra checkpoint often means that the
> system will do MUCH MORE work, because every checkpoint triggers a new
> set of full-page writes over the actively-updated portion of the
> database.
>
> I think that very few people run systems with heavy write workloads
> with checkpoint_timeout=5m, precisely because of this issue. Almost
> every system I see has had that raised to at least 10m and sometimes
> 30m or more. It can make a massive difference.

Well, I think the worst case is that the checkpoint happens exactly
between two checkpoints, so you are checkpointing twice as often, but if
it happens just before or after a checkpoint, I assume the effect would
be minimal.

So, it seems we are weighing having a checkpoint happen in the middle of
a checkpoint interval vs writing more WAL. If the WAL traffic, without
CREATE DATABASE, is high, and the template database is small, writing
more WAL and skipping the checkpoint will be win, but if the WAL traffic
is small and the template database is big, the extra WAL will be a loss.
Is this accurate?

> I can't predict whether PostgreSQL will get TDE in the future, and if
> it does, I can't predict what form it will take. Therefore any strong
> statement about whether this will benefit TDE or not seems to me to be
> pretty questionable - we don't know that it will be useful, and we

Agreed. We would want to have a different heap/index key on the standby
so we can rotate the heap/index key.

> don't know that it won't. But, like Dilip, I think the way we're
> WAL-logging CREATE DATABASE right now is a hack, and I *know* it can

Yes, it is a hack, but it seems to be a clever one that we might have
chosen if it had not been part of the original system.

> cause massive performance drops on busy systems.

See above.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2022-02-09 18:37:13 Re: How to get started with contribution
Previous Message Euler Taveira 2022-02-09 18:23:06 Re: Identify missing publications from publisher while create/alter subscription.