Re: Synchronizing slots from primary to standby

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-09-11 04:19:00
Message-ID: CAJpy0uAAysf9XnqQs0x-cf8aKaMjzXg32aVd=AZguYWhHnc=ig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 8, 2023 at 4:40 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Shveta,
>
> I resumed to check the thread. Here are my high-level comments.
> Sorry if you have been already discussed.

Thanks Kuroda-san for the feedback.
>
> 01. General
>
> I think the documentation can be added, not only GUCs. How about adding examples
> for combinations of physical and logical replications? You can say that both of
> physical primary can be publisher and slots on primary/standby are synchronized.
>

I did not fully understand this. Can you please state a clear example.
We are only synchronizing logical replication slots in this draft and
that too on physical standby from primary. So the last statement is
not completely true.

> 02. General
>
> standby_slot_names ensures that physical standby is always ahead subscriber, but I
> think it may be not sufficient. There is a possibility that primary server does
> not have any physical slots.So it expects a slot to be present.
> In this case the physical standby may be behind the
> subscriber and the system may be confused when the failover is occured.

Currently there is a check in slot-sync worker which mandates that
there is a physical slot present between primary and standby for this
feature to proceed.So that confusion state will not arise.
+ /* WalRcvData is not set or primary_slot_name is not set yet */
+ if (!WalRcv || WalRcv->slotname[0] == '\0')
+ return naptime;

>Can't we specify the name of standby via application_name or something?

So do you mean that in absence of a physical slot (if we plan to
support that), we let primary know about standby(slots-synchronization
client) through application_name? I am not sure about this. Will think
more on this. I would like to know others' opinion on this as well.

>
> 03. General
>
> In this architecture, the syncslot worker is launched per db and they
> independently connects to primary, right?

Not completely true. Each slotsync worker is responsible for managing
N dbs. Here 'N' = 'Number of distinct dbs for slots in
synchronize_slot_names'/ 'number of max_slotsync_workers configured'
for cases where dbcount exceeds workers configured.
And if dbcount < max_slotsync_workers, then we launch only that many
workers equal to dbcount and each worker manages a single db. Each
worker independently connects to primary. Currently it makes a
connection multiple times, I am optimizing it to make connection only
once and then after each SIGHUP assuming 'primary_conninfo' may
change. This change will be in the next version.

>I'm not sure it is efficient, but I
> come up with another architecture - only a worker (syncslot receiver)connects
> to the primary and other workers (syncslot worker) receives infos from it and
> updates. This can reduce the number of connections so that it may slightly
> improve the latency of network. How do you think?
>

I feel it may help in reducing network latency, but not sure if it
could be more efficient in keeping the lsns in sync. I feel it may
introduce lag due to the fact that only one worker is getting all the
info from primary and the actual synchronizing workers are waiting on
that worker. This lag may be more when the number of slots are huge.
We have run some performance tests on the design implemented
currently, please have a look at emails around [1] and [2].

> 04. General
>
> test file recovery/t/051_slot_sync.pl is missing.
>

yes, it was removed. Please see point3 at [3]

> 04. ReplSlotSyncMain
>
> Does the worker have to connect to the specific database?
>
>
> ```
> /* Connect to our database. */
> BackgroundWorkerInitializeConnectionByOid(MySlotSyncWorker->dbid,
> MySlotSyncWorker->userid,
> 0);
> ```

Since we are using libpq public interface 'walrcv_exec=libpqrcv_exec'
to connect to primary, this needs database connection. It errors out
in the absence of 'MyDatabaseId'. Do you think db-connection can have
some downsides?

>
> 05. SlotSyncInitSlotNamesLst()
>
> "Lst" should be "List".
>

Okay, I will change this in the next version.

==========

[1]: https://www.postgresql.org/message-id/CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6%3Dg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAJpy0uD%3DDevMxTwFVsk_%3DxHqYNH8heptwgW6AimQ9fbRmx4ioQ%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-09-11 04:34:56 Re: proposal: possibility to read dumped table's name from file
Previous Message Fujii Masao 2023-09-11 03:54:11 Re: pg_logical_emit_message() misses a XLogFlush()