Quick Links

Re: speed up a logical replica setup

From:	"Euler Taveira" <euler(at)eulerto(dot)com>
To:	"kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	"pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "vignesh C" <vignesh21(at)gmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "Peter Eisentraut" <peter(at)eisentraut(dot)org>, "Andres Freund" <andres(at)anarazel(dot)de>, "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject:	Re: speed up a logical replica setup
Date:	2024-02-16 03:14:38
Message-ID:	2231a04b-f2d4-4a4e-b5cd-56be8b002427@app.fastmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 15, 2024, at 8:23 AM, Hayato Kuroda (Fujitsu) wrote:
> > Points raised by me [1] are not solved yet.
> >
> > * What if the target version is PG16-?

pg_ctl and pg_resetwal won't work.

$ pg_ctl start -D /tmp/blah
waiting for server to start....
2024-02-15 23:50:03.448 -03 [364610] FATAL: database files are incompatible with server
2024-02-15 23:50:03.448 -03 [364610] DETAIL: The data directory was initialized by PostgreSQL version 16, which is not compatible with this version 17devel.
stopped waiting
pg_ctl: could not start server
Examine the log output.

$ pg_resetwal -D /tmp/blah
pg_resetwal: error: data directory is of wrong version
pg_resetwal: detail: File "PG_VERSION" contains "16", which is not compatible with this program's version "17".

> > * What if the found executables have diffent version with pg_createsubscriber?

The new code take care of it.

> > * What if the target is sending WAL to another server?
> > I.e., there are clusters like `node1->node2-.node3`, and the target is node2.

The new code detects if the server is in recovery and aborts as you suggested.
A new option can be added to ignore the fact there are servers receiving WAL
from it.

> > * Can we really cleanup the standby in case of failure?
> > Shouldn't we suggest to remove the target once?

If it finishes the promotion, no. I adjusted the cleanup routine a bit to avoid
it. However, we should provide instructions to inform the user that it should
create a fresh standby and try again.

> > * Can we move outputs to stdout?

Are you suggesting to use another logging framework? It is not a good idea
because each client program is already using common/logging.c.

> 1.
> Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
> but the inclusion was removed per comment. Added the inclusion again.

Ok.

> 2.
> As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
> the last node would stuck. This is because a walreciever process requires nodes have the same
> system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.

Hopefully it was fixed.

> 3.
> Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
> publications on the standby node.

Ditto.

> 4.
> If the standby server was initialized as PG16-, this command would fail.
> Because the API of pg_logical_create_replication_slot() were changed.

See comment above.

> 5.
> Also, used pg_ctl commands must have same versions with the instance.
> I think we should require all the executables and servers must be a same major version.

It is enforced by the new code. See find_other_exec() in get_exec_path().

> Based on them, below part describes attached ones:

Thanks for another review. I'm sharing a new patch to merge a bunch of
improvements and fixes. Comments are below.

v20-0002: I did some extensive documentation changes (including some of them
related to the changes in the new patch). I will defer its update to check
v20-0002. It will be included in the next one.

v20-0003: I included most of it. There are a few things that pgindent reverted
so I didn't apply. I also didn't like some SQL commands that were broken into
multiple lines with spaces at the beginning. It seems nice in the code but it
is not in the output.

v20-0004: Nice catch. Applied.

v20-0005: Applied.

v20-0006: I prefer to remove the comment.

v20-0007: I partially applied it. I only removed the states that were not used
and propose another dry run mode message. Maybe it is clear than it was.

v20-0008: I refactored the get_bin_directory code. Under reflection, I reverted
the unified binary directory that we agreed a few days ago. The main reason is
to provide a specific error message for each program it is using. The
get_exec_path will check if the program is available in the same directory as
pg_createsubscriber and if it has the same version. An absolute path is
returned and is used by some functions.

v20-0009: to be reviewed.

v20-0010: As I said above, this code was refactored so I didn't apply this one.

v20-0011: Do we really want to interrupt the recovery if the network was
momentarily interrupted or if the OS killed walsender? Recovery is critical for
the process. I think we should do our best to be resilient and deliver all
changes required by the new subscriber. The proposal is not correct because the
query return no tuples if it is disconnected so you cannot PQgetvalue(). The
retry interval (the time that ServerLoop() will create another walreceiver) is
determined by DetermineSleepTime() and it is a maximum of 5 seconds
(SIGKILL_CHILDREN_AFTER_SECS). One idea is to retry 2 or 3 times before give up
using the pg_stat_wal_receiver query. Do you have a better plan?

v20-0012: I applied a different patch to accomplish the same thing. I included
a refactor around pg_is_in_recovery() function to be used in other 2 points.

Besides that, I changed some SQL commands to avoid having superfluous
whitespace in it. I also added a test for cascaded replication scenario. And
clean up 041 test a bit.

I didn't provide an updated documentation because I want to check v20-0002. It
is on my list to check v20-0009.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachment	Content-Type	Size
v21-0001-Creates-a-new-logical-replica-from-a-standby-ser.patch	text/x-patch	81.8 KB

In response to

RE: speed up a logical replica setup at 2024-02-15 11:23:16 from Hayato Kuroda (Fujitsu)

Responses

RE: speed up a logical replica setup at 2024-02-16 06:41:37 from Hayato Kuroda (Fujitsu)
RE: speed up a logical replica setup at 2024-02-16 12:53:43 from Hayato Kuroda (Fujitsu)
Re: speed up a logical replica setup at 2024-02-19 09:47:37 from Peter Eisentraut
Re: speed up a logical replica setup at 2024-02-19 10:22:34 from Shlok Kyal

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hayato Kuroda (Fujitsu)	2024-02-16 03:26:06	RE: pg_upgrade and logical replication
Previous Message	Andrei Lepikhov	2024-02-16 03:12:47	Re: Memory consumed by paths during partitionwise join planning