RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Euler Taveira' <euler(at)eulerto(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-02-15 11:23:16
Message-ID: TYCPR01MB12077646F91423005D657FF20F54D2@TYCPR01MB12077.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Euler,

> Policy)
>
> Basically, we do not prohibit to connect to primary/standby.
> primary_slot_name may be changed during the conversion and
> tuples may be inserted on target just after the promotion, but it seems no issues.
>
> API)
>
> -D (data directory) and -d (databases) are definitively needed.
>
> Regarding the -P (a connection string for source), we can require it for now.
> But note that it may cause an inconsistency if the pointed not by -P is different
> from the node pointde by primary_conninfo.
>
> As for the connection string for the target server, we can choose two ways:
> a)
> accept native connection string as -S. This can reuse the same parsing
> mechanism as -P,
> but there is a room that non-local server is specified.
>
> b)
> accept username/port as -U/-p
> (Since the policy is like above, listen_addresses would not be overwritten. Also,
> the port just specify the listening port).
> This can avoid connecting to non-local, but more options may be needed.
> (E.g., Is socket directory needed? What about password?)
>
> Other discussing point, reported issue)
>
> Points raised by me [1] are not solved yet.
>
> * What if the target version is PG16-?
> * What if the found executables have diffent version with pg_createsubscriber?
> * What if the target is sending WAL to another server?
> I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
> * Can we really cleanup the standby in case of failure?
> Shouldn't we suggest to remove the target once?
> * Can we move outputs to stdout?

Based on the discussion, I updated the patch set. Feel free to pick them and include.
Removing -P patch was removed, but removing -S still remained.

Also, while testing the patch set, I found some issues.

1.
Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
but the inclusion was removed per comment. Added the inclusion again.

2.
As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
the last node would stuck. This is because a walreciever process requires nodes have the same
system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.

3.
Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
publications on the standby node.

4.
If the standby server was initialized as PG16-, this command would fail.
Because the API of pg_logical_create_replication_slot() were changed.

5.
Also, used pg_ctl commands must have same versions with the instance.
I think we should require all the executables and servers must be a same major version.

Based on them, below part describes attached ones:

V20-0001: same as Euler's patch, v17-0001.
V20-0002: Update docs per recent changes. Same as v19-0002
V20-0003: Modify the alignment of codes. Same as v19-0003
V20-0004: Change an argument of get_base_conninfo. Same as v19-0004
=== experimental patches ===
V20-0005: Add testcases. Same as v19-0004
V20-0006: Update a comment above global variables. Same as v19-0005
V20-0007: Address comments from Vignesh. Some parts you don't like
are reverted.
V20-0008: Fix error message in get_bin_directory(). Same as v19-0008
V20-0009: Remove -S option. Refactored from v16-0007
V20-0010: Add check versions of executables and the target, per above and [4]
V20-0011: Detect a disconnection while waiting the recovery, per [4]
V20-0012: Avoid running pg_createsubscriber for cascade physical replication, per above.

[1]: https://cirrus-ci.com/task/4619792833839104
[2]: https://www.postgresql.org/message-id/CALDaNm1r9ZOwZamYsh6MHzb%3D_XvhjC_5XnTAsVecANvU9FOz6w%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAHv8RjJcUY23ieJc5xqg6-QeGr1Ppp4Jwbu7Mq29eqCBTDWfUw%40mail.gmail.com
[4]: https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment Content-Type Size
v20-0001-Creates-a-new-logical-replica-from-a-standby-ser.patch application/octet-stream 78.0 KB
v20-0002-Update-documentation.patch application/octet-stream 12.2 KB
v20-0003-Follow-coding-conversions.patch application/octet-stream 42.2 KB
v20-0004-Fix-argument-for-get_base_conninfo.patch application/octet-stream 1.8 KB
v20-0005-Add-testcase.patch application/octet-stream 3.8 KB
v20-0006-Update-comments-atop-global-variables.patch application/octet-stream 853 bytes
v20-0007-Address-comments-from-Vignesh-round-two.patch application/octet-stream 3.7 KB
v20-0008-Fix-error-message-for-get_bin_directory.patch application/octet-stream 954 bytes
v20-0009-Remove-S-option-to-force-unix-domain-connection.patch application/octet-stream 11.5 KB
v20-0010-Add-version-check-for-executables-and-standby-se.patch application/octet-stream 4.4 KB
v20-0011-Detect-the-disconnection-from-the-primary-during.patch application/octet-stream 2.5 KB
v20-0012-Avoid-running-pg_createsubscriber-for-cascade-ph.patch application/octet-stream 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-02-15 11:24:59 Re: Fix race condition in InvalidatePossiblyObsoleteSlot()
Previous Message Tomas Vondra 2024-02-15 11:10:29 Re: planner chooses incremental but not the best one