RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, 'Shubham Khanna' <khannashubham1197(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-01-26 05:51:49
Message-ID: TY3PR01MB98897C85700C6DF942D2D0A3F5792@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Euler,

Thanks for updating the patch! Before reading yours, I wanted to reply some of comments.

>
I'm still thinking about replacing --subscriber-conninfo with separate items
(username, port, password?, host = socket dir). Maybe it is an overengineering.
The user can always prepare the environment to avoid unwanted and/or external
connections.
>

For me, required amount of fixes are not so different from current one. How about
others?

>
It is extremely useful because (a) you have a physical replication setup and
don't know if it is prepared for logical replication, (b) check GUCs (is
max_wal_senders sufficient for this pg_subscriber command? Or is
max_replication_slots sufficient to setup the logical replication even though I
already have some used replication slots?), (c) connectivity and (d)
credentials.
>

Yeah, it is useful for verification purpose, so let's keep this option.
But I still think the naming should be "--check". Also, there are many
`if (!dry_run)` but most of them can be removed if the process exits earlier.
Thought?

>
> 05.
> I felt we should accept some settings from enviroment variables, like pg_upgrade.
> Currently, below items should be acceted.
>
> - data directory
> - username
> - port
> - timeout

Maybe PGDATA.
>

Sorry, I cannot follow this. Did you mean that the target data directory should
be able to be specified by PGDATA? OF so, +1.

>
> 06.
> ```
> pg_logging_set_level(PG_LOG_WARNING);
> ```
>
> If the default log level is warning, there are no ways to output debug logs.
> (-v option only raises one, so INFO would be output)
> I think it should be PG_LOG_INFO.

You need to specify multiple -v options.
>

Hmm. I felt the specification was bit strange...but at least it must be
described on the documentation. pg_dump.sgml has similar lines.

>
> 08.
> Not sure, but if we want to record outputs by pg_subscriber, the sub-directory
> should be created. The name should contain the timestamp.

The log file already contains the timestamp. Why?
>

This comment assumed outputs by pg_subscriber were also recorded to a file.
In this case and if the file also has the same timestamp, I think they can be
gathered in the same place. No need if outputs are not recorded.

>
> 09.
> Not sure, but should we check max_slot_wal_keep_size of primary server? It can
> avoid to fail starting of logical replicaiton.

A broken physical replication *before* running this tool is its responsibility?
Hmm. We might add another check that can be noticed during dry run mode.
>

I thought that we should not generate any broken objects, but indeed, not sure
it is our scope. How do other think?

>
> 11.
> ```
> /*
> * Stop the subscriber if it is a standby server. Before executing the
> * transformation steps, make sure the subscriber is not running because
> * one of the steps is to modify some recovery parameters that require a
> * restart.
> */
> if (stat(pidfile, &statbuf) == 0)
> ```
>
> I kept just in case, but I'm not sure it is still needed. How do you think?
> Removing it can reduce an inclusion of pidfile.h.

Are you suggesting another way to check if the standby is up and running?
>

Running `pg_ctl stop` itself can detect whether the process has been still alive.
It would exit with 1 when the process is not there.

>
> I didn't notice the comment, but still not sure the reason. Why we must reserve
> the slot until pg_subscriber finishes? IIUC, the slot would be never used, it
> is created only for getting a consistent_lsn. So we do not have to keep.
> Also, just before, logical replication slots for each databases are created, so
> WAL records are surely reserved.
>

I want to confirm the conclusion - will you remove the creation of a transient slot?

Also, not tested, I'm now considering that we can reuse the primary_conninfo value.
We are assuming that the target server is standby and the current upstream one will
convert to publisher. In this case, the connection string is already specified as
primary_conninfo so --publisher-conninfo may not be needed. The parameter does
not contain database name, so --databases is still needed. I imagine like:

1. Parse options
2. Turn on standby
3. Verify the standby
4. Turn off standby
5. Get primary_conninfo from standby
6. Connect to primary (concat of primary_conninfo and an option is used)
7. Verify the primary
...

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-01-26 06:00:00 Race condition in FetchTableStates() breaks synchronization of subscription tables
Previous Message Yugo NAGATA 2024-01-26 05:40:11 Re: Small fix on COPY ON_ERROR document