Re: speed up a logical replica setup

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "vignesh C" <vignesh21(at)gmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "Peter Eisentraut" <peter(at)eisentraut(dot)org>, "Andres Freund" <andres(at)anarazel(dot)de>, "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-01-25 23:34:46
Message-ID: b315c7da-7ab1-4014-a2a9-8ab6ae26017c@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 23, 2024, at 10:29 PM, Euler Taveira wrote:
> I'll post a new one soon.

I'm attaching another patch that fixes some of the issues pointed out by
Hayato, Shlok, and Junwang.

* publication doesn't exist. The analysis [1] was done by Hayato but I didn't
use the proposed patch. Instead I refactored the code a bit [2] and call it
from a new function (setup_publisher) that is called before the promotion.
* fix wrong path name in the initial comment [3]
* change terminology: logical replica -> physical replica [3]
* primary / standby is ready for logical replication? setup_publisher() and
setup_subscriber() check if required GUCs are set accordingly. For primary,
it checks wal_level = logical, max_replication_slots has remain replication
slots for the proposed setup and also max_wal_senders available. For standby,
it checks max_replication_slots for replication origin and also remain number
of background workers to start the subscriber.
* retain option: I extracted this one from Hayato's patch [4]
* target server must be a standby. It seems we agree that this restriction
simplifies the code a bit but can be relaxed in the future (if/when base
backup support is added.)
* recovery timeout option: I decided to include it but I think the use case is
too narrow. It helps in broken setups. However, it can be an issue in some
scenarios like time-delayed replica, large replication lag, slow hardware,
slow network. I didn't use the proposed patch [5]. Instead, I came up with a
simple one that defaults to forever. The proposed one defaults to 60 seconds
but I'm afraid that due to one of the scenarios I said in a previous
sentence, we cancel a legitimate case. Maybe we should add a message during
dry run saying that due to a replication lag, it will take longer to run.
* refactor primary_slot_name code. With the new setup_publisher and
setup_subscriber functions, I splitted the function that detects the
primary_slot_name use into 2 pieces just to avoid extra connections to have
the job done.
* remove fallback_application_name as suggested by Hayato [5] because logical
replication already includes one.

I'm still thinking about replacing --subscriber-conninfo with separate items
(username, port, password?, host = socket dir). Maybe it is an overengineering.
The user can always prepare the environment to avoid unwanted and/or external
connections.

I didn't change the name from pg_subscriber to pg_createsubscriber yet but if I
didn't hear objections about it, I'll do it in the next patch.

[1] https://www.postgresql.org/message-id/TY3PR01MB9889C5D55206DDD978627D07F5752%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[2] https://www.postgresql.org/message-id/73ab86ca-3fd5-49b3-9c80-73d1525202f1%40app.fastmail.com
[3] https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[4] https://www.postgresql.org/message-id/TY3PR01MB9889678E47B918F4D83A6FD8F57B2%40TY3PR01MB9889.jpnprd01.prod.outlook.com
[5] https://www.postgresql.org/message-id/TY3PR01MB9889593399165B9A04106741F5662%40TY3PR01MB9889.jpnprd01.prod.outlook.com

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachment Content-Type Size
v9-0001-Creates-a-new-logical-replica-from-a-standby-serv.patch text/x-patch 73.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-01-25 23:35:19 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Mark Wong 2024-01-25 22:39:01 Re: Guiding principle for dropping LLVM versions?