Re: speed up a logical replica setup

From: Shubham Khanna <khannashubham1197(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-02-16 09:20:05
Message-ID: CAHv8Rj+5mzK9Jt+7ECogJzfm5czvDCCd5jO1_rCx0bTEYpBE5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 15, 2024 at 4:53 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Euler,
>
> > Policy)
> >
> > Basically, we do not prohibit to connect to primary/standby.
> > primary_slot_name may be changed during the conversion and
> > tuples may be inserted on target just after the promotion, but it seems no issues.
> >
> > API)
> >
> > -D (data directory) and -d (databases) are definitively needed.
> >
> > Regarding the -P (a connection string for source), we can require it for now.
> > But note that it may cause an inconsistency if the pointed not by -P is different
> > from the node pointde by primary_conninfo.
> >
> > As for the connection string for the target server, we can choose two ways:
> > a)
> > accept native connection string as -S. This can reuse the same parsing
> > mechanism as -P,
> > but there is a room that non-local server is specified.
> >
> > b)
> > accept username/port as -U/-p
> > (Since the policy is like above, listen_addresses would not be overwritten. Also,
> > the port just specify the listening port).
> > This can avoid connecting to non-local, but more options may be needed.
> > (E.g., Is socket directory needed? What about password?)
> >
> > Other discussing point, reported issue)
> >
> > Points raised by me [1] are not solved yet.
> >
> > * What if the target version is PG16-?
> > * What if the found executables have diffent version with pg_createsubscriber?
> > * What if the target is sending WAL to another server?
> > I.e., there are clusters like `node1->node2-.node3`, and the target is node2.
> > * Can we really cleanup the standby in case of failure?
> > Shouldn't we suggest to remove the target once?
> > * Can we move outputs to stdout?
>
> Based on the discussion, I updated the patch set. Feel free to pick them and include.
> Removing -P patch was removed, but removing -S still remained.
>
> Also, while testing the patch set, I found some issues.
>
> 1.
> Cfbot got angry [1]. This is because WIFEXITED and others are defined in <sys/wait.h>,
> but the inclusion was removed per comment. Added the inclusion again.
>
> 2.
> As Shubham pointed out [3], when we convert an intermediate node of cascading replication,
> the last node would stuck. This is because a walreciever process requires nodes have the same
> system identifier (in WalReceiverMain), but it would be changed by pg_createsubscriebr.
>
> 3.
> Moreover, when we convert a last node of cascade, it won't work well. Because we cannot create
> publications on the standby node.
>
> 4.
> If the standby server was initialized as PG16-, this command would fail.
> Because the API of pg_logical_create_replication_slot() were changed.
>
> 5.
> Also, used pg_ctl commands must have same versions with the instance.
> I think we should require all the executables and servers must be a same major version.
>
> Based on them, below part describes attached ones:
>
> V20-0001: same as Euler's patch, v17-0001.
> V20-0002: Update docs per recent changes. Same as v19-0002
> V20-0003: Modify the alignment of codes. Same as v19-0003
> V20-0004: Change an argument of get_base_conninfo. Same as v19-0004
> === experimental patches ===
> V20-0005: Add testcases. Same as v19-0004
> V20-0006: Update a comment above global variables. Same as v19-0005
> V20-0007: Address comments from Vignesh. Some parts you don't like
> are reverted.
> V20-0008: Fix error message in get_bin_directory(). Same as v19-0008
> V20-0009: Remove -S option. Refactored from v16-0007
> V20-0010: Add check versions of executables and the target, per above and [4]
> V20-0011: Detect a disconnection while waiting the recovery, per [4]
> V20-0012: Avoid running pg_createsubscriber for cascade physical replication, per above.
>
> [1]: https://cirrus-ci.com/task/4619792833839104
> [2]: https://www.postgresql.org/message-id/CALDaNm1r9ZOwZamYsh6MHzb%3D_XvhjC_5XnTAsVecANvU9FOz6w%40mail.gmail.com
> [3]: https://www.postgresql.org/message-id/CAHv8RjJcUY23ieJc5xqg6-QeGr1Ppp4Jwbu7Mq29eqCBTDWfUw%40mail.gmail.com
> [4]: https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

I found a couple of issues, while verifying the cascaded replication
with the following scenarios:
Scenario 1) Create cascade replication like node1->node2->node3
without using replication slots (attached
cascade_3node_setup_without_slots has the script for this):
Then I ran pg_createsubscriber by specifying primary as node1 and
standby as node3, this scenario runs successfully. I was not sure if
this should be supported or not?
Scenario 2) Create cascade replication like node1->node2->node3 using
replication slots (attached cascade_3node_setup_with_slots has the
script for this):
Here, slot name was used as slot1 for node1 to node2 and slot2 for
node2 to node3. Then I ran pg_createsubscriber by specifying primary
as node1 and standby as node3. In this case pg_createsubscriber fails
with the following error:
pg_createsubscriber: error: could not obtain replication slot
information: got 0 rows, expected 1 row
[Inferior 1 (process 2623483) exited with code 01]

This is failing because slot name slot2 is used between node2->node3
but pg_createsubscriber is checked for slot1, the slot which is used
for replication between node1->node2.
Thoughts?

Thanks and Regards,
Shubham Khanna.

Attachment Content-Type Size
cascade_3node_setup_without_slots.sh text/x-sh 1.4 KB
cascade_3node_setup_with_slots.sh text/x-sh 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-02-16 09:25:25 Re: Allow passing extra options to initdb for tests
Previous Message Daniel Gustafsson 2024-02-16 09:16:37 Re: Replace current implementations in crypt() and gen_salt() to OpenSSL