Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Patrick Krecker <pkrecker(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure
Date: 2018-02-14 04:24:43
Message-ID: 20180214042443.GB1993@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 13, 2018 at 01:44:34PM -0800, Patrick Krecker wrote:
> I am searching for a way to make a contribution to Postgres and I came
> across this TODO item (I realize there has been some controversy
> around the TODO list [1], and I hope that my use of it doesn't spark
> another discussion about removing it altogether):

Well, it will point out again that TODO items are hard, complicated and
mostly impossible projects.

> "Allow WAL replay of CREATE TABLESPACE to work when the directory
> structure on the recovery computer is different from the original"
>
> Currently it looks like tablespaces have to live inside the data
> directory on the replica, notwithstanding administrator intervention
> by manipulating the tablespace directory with symlinks after (or even
> before?) it has been created via replay.

Let's be clear here. There is no hard restriction with tablespace paths
within the data directory, though you should not do that, and you get a
nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6).
This also causes pg_basebackup to fail. It is also bad design to create
tablespaces within the data directory as those are aimed at making hot
paths work on different partitions with different I/O properties.

> Is the idea behind this task to allow the master to instruct the
> replica where to put the tablespace on its filesystem, so as to allow
> it to live outside of the data directory without direct manipulation
> of the filesystem?

WAL records associated to CREATE TABLESPACE (xl_tblspc_create_rec)
register the location where a tablespace is located. The location of a
tablespace is not saved in the system catalogs, which offers flexibility
in the way the symlink from pg_tblspc can be handled. This is where the
tablespace path remapping of pg_basebackup becomes handy, because you
can repurpose paths easily when taking a base backup, but this forces
you to create tablespaces first, and then create standbys. We have also
a set of existing problems:
1) If a primary and its standby are on the same server and you issue a
CREATE TABLESPACE, then they would try to write to the same paths.
2) How do we design at DDL level a command which allows for specifying
different paths depending on the node where the recovery happens.

You would need in both cases a sort of ability to define a node name, so
as for 1) you append the node name to the path and both primary and
standby can use the same tablespace path, but with different sub-paths.
And for 2), you can enforce a patch name by defining as well a path
associated to a node name so as when xl_tblspc_create_rec records are
replayed at recovery, you know which path to create. Just designing
that the right way as its own set of complications.

> If this task is a worthwhile endeavor, I would be happy to take it on.
> If not, I am open to other ideas :)

This is part of the difficult, perhaps-not-worth doing impossible
problems. As a first contribution, you may want something easier.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2018-02-14 04:37:05 [bug fix] Cascaded standby cannot start after a clean shutdown
Previous Message Tom Lane 2018-02-14 04:10:27 Re: tapeblocks is uninitialized in logtape.c