RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: RE: speed up a logical replica setup
Date: 2024-01-09 07:01:41
Message-ID: TY3PR01MB988978C7362A101927070D29F56A2@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

> > > I don't see any harm in users giving those information but we should
> > > have some checks to ensure that the server is in standby mode and is
> > > running locally. The other related point is do we need to take input
> > > for the target cluster directory from the user? Can't we fetch that
> > > information once we are connected to standby?
> >
> > I think that functions like inet_client_addr() may be able to use, but it returns
> > NULL only when the connection is via a Unix-domain socket. Can we restrict
> > pg_subscriber to use such a socket?
> >
>
> Good question. So, IIUC, this tool has a requirement to run locally
> where standby is present because we want to write reconvery.conf file.
> I am not sure if it is a good idea to have a restriction to use only
> the unix domain socket as users need to set up the standby for that by
> configuring unix_socket_directories. It is fine if we can't ensure
> that it is running locally but we should at least ensure that the
> server is a physical standby node to avoid the problems as Shlok has
> reported.

While thinking more about it, I found that we did not define the policy
whether user must not connect to the target while running pg_subscriber. What
should be? If it should be avoided, some parameters like listen_addresses and
unix_socket_permissions should be restricted like start_postmaster() in
pg_upgrade/server.c. Also, the port number should be changed to another value
as well.

Personally, I vote to reject connections during the pg_subscriber.

> On a related point, I see that the patch stops the standby server (if
> it is running) before starting with subscriber-side steps. I was
> wondering if users can object to it that there was some important data
> replication in progress which this tool has stopped. Now, OTOH,
> anyway, once the user uses pg_subscriber, the standby server will be
> converted to a subscriber, so it may not be useful as a physical
> replica. Do you or others have any thoughts on this matter?

I assumed that connections should be closed before running pg_subscriber. If so,
it may be better to just fail the command when the physical standby has already
been started. There is no answer whether data replication and user queries
should stop. Users should choose the stop option based on their policy and then
pg_subscriebr can start postmaster.
pg_upgrade does the same thing in setup().

====

Further comment:
According to the doc, currently pg_subscriber is listed in the client application.
But based on the definition, I felt it should be at "PostgreSQL Server Applications"
page. How do you think? The definition is:

>
This part contains reference information for PostgreSQL server applications and
support utilities. These commands can only be run usefully on the host where the
database server resides. Other utility programs are listed in PostgreSQL Client
Applications.
>

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-01-09 07:08:48 Re: Parallelize correlated subqueries that execute within each worker
Previous Message vignesh C 2024-01-09 06:50:45 Re: Relation bulk write facility