speed up a logical replica setup

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: speed up a logical replica setup
Date: 2022-02-21 12:09:12
Message-ID: 5ac50071-f2ed-4ace-a8fd-b892cffd33eb@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Logical replication has been used to migration with minimal downtime. However,
if you are dealing with a big database, the amount of required resources (disk
-- due to WAL retention) increases as the backlog (WAL) increases. Unless you
have a generous amount of resources and can wait for long period of time until
the new replica catches up, creating a logical replica is impracticable on
large databases.

The general idea is to create and convert a physical replica or a base backup
(archived WAL files available) into a logical replica. The initial data copy
and catchup tends to be faster on a physical replica. This technique has been
successfully used in pglogical_create_subscriber [1].

A new tool called pg_subscriber does this conversion and is tightly integrated
with Postgres.

DESIGN

The conversion requires 8 steps.

1. Check if the target data directory has the same system identifier than the
source data directory.
2. Stop the target server if it is running as a standby server. (Modify
recovery parameters requires a restart.)
3. Create one replication slot per specified database on the source server. One
additional replication slot is created at the end to get the consistent LSN
(This consistent LSN will be used as (a) a stopping point for the recovery
process and (b) a starting point for the subscriptions).
4. Write recovery parameters into the target data directory and start the
target server (Wait until the target server is promoted).
5. Create one publication (FOR ALL TABLES) per specified database on the source
server.
6. Create one subscription per specified database on the target server (Use
replication slot and publication created in a previous step. Don't enable the
subscriptions yet).
7. Sets the replication progress to the consistent LSN that was got in a
previous step.
8. Enable the subscription for each specified database on the target server.

This tool does not take a base backup. It can certainly be included later.
There is already a tool do it: pg_basebackup.

There is a --subscriber-conninfo option to inform the subscriber connection
string, however, we could remove it since this tool runs on the subscriber and
we can build a connection string.

NAME

I'm not sure about the proposed name. I came up with this one because it is not
so long. The last added tools uses pg_ prefix, verb (action) and object.
pg_initsubscriber and pg_createsubscriber are names that I thought but I'm not
excited about it.

DOCUMENTATION

It is available and describes this tool.

TESTS

Basic tests are included. It requires some tests to exercise this tool.

Comments?

[1] https://github.com/2ndQuadrant/pglogical

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachment Content-Type Size
v1-0001-Move-readfile-and-free_readfile-to-file_utils.h.patch text/x-patch 10.9 KB
v1-0002-Create-a-new-logical-replica-from-a-base-backup-o.patch text/x-patch 55.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2022-02-21 12:44:04 RE: Optionally automatically disable logical replication subscriptions on error
Previous Message Andrew Dunstan 2022-02-21 12:00:54 Re: set TESTDIR from perl rather than Makefile