Re: Synchronizing slots from primary to standby

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2024-01-19 11:53:53
Message-ID: CAA4eK1JZDtRFPJaeaLvj74760pcvaEAVxNPN0+oW5A4jL1WCBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2024 at 4:00 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
>

I had some off-list discussions with Sawada-San, Hou-San, and Shveta
on the topic of extending replication commands instead of using the
current model where we fetch the required slot information via SQL
using a database connection. I would like to summarize the discussion
and would like to know the thoughts of others on this topic.

In the current patch, we launch the slotsync worker on physical
standby which connects to the specified database (currently we let
users specify the required dbname in primary_conninfo) on the primary.
It then fetches the required information for failover marked slots
from the primary and also does some primitive checks on the upstream
node via SQL (the additional checks are like whether the upstream node
has a specified physical slot or whether the upstream node is a
primary node or a standby node). To fetch the required information it
uses a libpqwalreciever API which is mostly apt for this purpose as it
supports SQL execution but for this patch, we don't need a replication
connection, so we extend the libpqwalreciever connect API.

Now, the concerns related to this could be that users would probably
need to change existing mechanisms/tools to update priamry_conninfo
and one of the alternatives proposed is to have an additional GUC like
slot_sync_dbname. Users won't be able to drop the database this worker
is connected to aka whatever is specified in slot_sync_dbname but as
the user herself sets up the configuration it shouldn't be a big deal.
Then we also discussed whether extending libpqwalreceiver's connect
API is a good idea and whether we need to further extend it in the
future. As far as I can see, slotsync worker's primary requirement is
to execute SQL queries which the current API is sufficient, and don't
see something that needs any drastic change in this API. Note that
tablesync worker that executes SQL also uses these APIs, so we may
need something in the future for either of those. Then finally we need
a slotsync worker to also connect to a database to use SQL and fetch
results.

Now, let us consider if we extend the replication commands like
READ_REPLICATION_SLOT and or introduce a new set of replication
commands to fetch the required information then we don't need a DB
connection with primary or a connection in slotsync worker. As per my
current understanding, it is quite doable but I think we will slowly
go in the direction of making replication commands something like SQL
because today we need to extend it to fetch all slots info that have
failover marked as true, the existence of a particular replication,
etc. Then tomorrow, if we want to extend this work to have multiple
slotsync workers say workers perdb then we have to extend the
replication command to fetch per-database failover marked slots. To
me, it sounds more like we are slowly adding SQL-like features to
replication commands.

Apart from this when we are reading per-db replication slots without
connecting to a database, we probably need some additional protection
mechanism so that the database won't get dropped.

Considering all this it seems that for now probably extending
replication commands can simplify a few things like mentioned above
but using SQL's with db-connection is more extendable.

Thoughts?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2024-01-19 12:22:37 Re: psql JSON output format
Previous Message Aleksander Alekseev 2024-01-19 11:45:02 Re: BUG: Former primary node might stuck when started as a standby