Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Andres Freund <andres(at)anarazel(dot)de>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2021-06-16 21:58:04
Message-ID: 20210616215804.nhc7bpkuoscfrv52@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-06-15 16:50:24 +0530, Dilip Kumar wrote:
> The patch modifies both CREATE DATABASE and ALTER DATABASE..SET
> TABLESPACE to be fully WAL-logged.

Generally quite a bit in favor of this - the current approach is very
heavyweight, slow and I think we have a few open corner bugs related to
it.

> Design Idea:
> -----------------
> First, create the target database directory along with the version
> file and WAL-log this operation.

What happens if you crash / promote at this point?

> Create the "relation map file" in the target database and copy the
> content from the source database. For this, we can use some modified
> versions of the write_relmap_file() and WAL-log the relmap create
> operation along with the file content. Now, read the relmap file to
> find the relfilenode for pg_class and then we read pg_class block by
> block and decode the tuples.

This doesn't seem like a great approach - you're not going to be able to
use much of the normal infrastructure around processing tuples. So it
seems like it'd end up with quite a bit of special case code that needs
to maintained in parallel.

> Now read the source relfilenode block by block using
> ReadBufferWithoutRelCache() and copy the block to the target
> relfilenode using smgrextend() and WAL-log them using log_newpage().
> For the source database, we can not directly use the smgrread(),
> because there could be some dirty buffers so we will have to read them
> through the buffer manager interface, otherwise, we will have to flush
> all the dirty buffers.

I think we might need a bit more batching for the WAL logging. There are
cases of template database considerably bigger than the default and the
overhead of logging each write separately seems likely to be noticable.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-06-16 22:02:32 Re: a path towards replacing GEQO with something better
Previous Message Tomas Vondra 2021-06-16 21:55:17 Re: postgres_fdw batching vs. (re)creating the tuple slots