Re: speed up a logical replica setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: speed up a logical replica setup
Date: 2022-02-24 07:05:17
Message-ID: CAA4eK1JpKMRGhju=spNmZM_yTNGM12EWqNjaHy6=2t6LaQs2hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 21, 2022 at 5:41 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> Logical replication has been used to migration with minimal downtime. However,
> if you are dealing with a big database, the amount of required resources (disk
> -- due to WAL retention) increases as the backlog (WAL) increases. Unless you
> have a generous amount of resources and can wait for long period of time until
> the new replica catches up, creating a logical replica is impracticable on
> large databases.
>
> The general idea is to create and convert a physical replica or a base backup
> (archived WAL files available) into a logical replica. The initial data copy
> and catchup tends to be faster on a physical replica. This technique has been
> successfully used in pglogical_create_subscriber [1].
>

Sounds like a promising idea.

> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.
>
> DESIGN
>
> The conversion requires 8 steps.
>
> 1. Check if the target data directory has the same system identifier than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source server. One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
>

What is the need to create an extra slot other than the slot for each
database? Can't we use the largest LSN returned by slots as the
recovery-target-lsn and starting point for subscriptions?

How, these additional slots will get freed or reused when say the
server has crashed/stopped after creating the slots but before
creating the subscriptions? Users won't even know the names of such
slots as they are internally created.

> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.
>
> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.
>

The backup will take the backup of all the databases present on the
source server. Do we need to provide the way/recommendation to remove
the databases that are not required?

Can we see some numbers with various sizes of databases (cluster) to
see how it impacts the time for small to large size databases as
compared to the traditional method? This might help giving users
advice on when to use this tool?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tanghy.fnst@fujitsu.com 2022-02-24 07:20:49 RE: Design of pg_stat_subscription_workers vs pgstats
Previous Message Sadhuprasad Patro 2022-02-24 06:56:08 Re: Per-table storage parameters for TableAM/IndexAM extensions