Re: Synchronizing slots from primary to standby

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Synchronizing slots from primary to standby
Date: 2023-12-29 01:29:21
Message-ID: CAD21AoBBvZQhFFNWV5eB8WZxgsbnDEinrGPGsHTHD-m0Ec4-CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 27, 2023 at 7:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Dec 27, 2023 at 11:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 26, 2023 at 9:27 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > >
> > > I would like to revisit the current dependency of slotsync worker on
> > > dbname used in 002 patch. Currently we accept dbname in
> > > primary_conninfo and thus the user has to make sure to provide one (by
> > > manually altering it) even in case of a conf file auto-generated by
> > > "pg_basebackup -R".
> > > Thus I would like to discuss if there are better ways to do it.
> > > Complete background is as follow:
> > >
> > > We need dbname for 2 purposes:
> > >
> > > 1) to connect to remote db in order to run SELECT queries to fetch the
> > > info needed by slotsync worker.
> > > 2) to make connection in slot-sync worker itself in order to be able
> > > to use libpq APIs for 1)
> > >
> > > We run 3 kind of select queries in slot-sync worker currently:
> > >
> > > a) To fetch all failover slots (logical slots) info at once in
> > > synchronize_slots().
> > > b) To fetch a particular slot info during
> > > wait_for_primary_slot_catchup() logic (logical slot).
> > > c) To validate primary slot (physical one) and also to distinguish
> > > between standby and cascading standby by running pg_is_in_recovery().
> > >
> > > 1) One approach to avoid dependency on dbname is using commands
> > > instead of SELECT. This will need implementing LIST_SLOTS command for
> > > a), and for b) we can use LIST_SLOTS and fetch everything (even though
> > > it is not needed) or have LIST_SLOTS with a filter on slot-name or
> > > extend READ_REPLICATION_SLOT, and for c) we can have some other
> > > command to get pg_is_in_recovery() info. But, I feel by relying on
> > > commands we will be making the extension of the slot-sync feature
> > > difficult. In future, if there is some more requirement to fetch any
> > > other info,
> > > then there too we have to implement a command. I am not sure if it is
> > > good and extensible approach.
> > >
> > > 2) Another way to avoid asking for a dbname in primary_conninfo is to
> > > use the default dbname internally. This brings us to two questions:
> > > 'How' and 'Which default db'?
> > >
> > > 2.1) To answer 'How':
> > > Using default dbname is simpler for the purpose of slot-sync worker
> > > having its own db-connection, but is a little tricky for the purpose
> > > of connection to remote_db. This is because we have to inject this
> > > dbname internally in our connection-info.
> > >
> > > 2.1.1) Say we use primary_conninfo (i.e. original one w/o dbname),
> > > then currently it could have 2 formats:
> > >
> > > a) The simple "=" format for key-value pairs, example:
> > > 'user=replication host=127.0.0.1 port=5433 dbname=postgres'.
> > > b) URI format, example:
> > > postgresql://other(at)localhost/otherdb?connect_timeout=10&application_name=myapp
> > >
> > > We can distinguish between the 2 formats using 'uri_prefix_length' but
> > > injecting the dbname part will be messy specially for URI format. If
> > > we want to do it w/o injecting and only by changing libpq interfaces
> > > to accept dbname separately apart from conninfo, then there is no
> > > current simpler way available. It will need a good amount of changes
> > > in libpq.
> > >
> > > 2.1.2) Another way is to not rely on primary_conninfo directly but
> > > rely on 'WalRcv->conninfo' in order to connect to remote_db. This is
> > > because the latter is never URI format, it is some parsed format and
> > > appending may work. As an example, primary_conninfo =
> > > 'postgresql://replication(at)localhost:5433', WalRcv->conninfo loaded
> > > internally is:
> > > "user=replication passfile=/home/shveta/.pgpass channel_binding=prefer
> > > dbname=replication host=localhost port=5433
> > > fallback_application_name=walreceiver sslmode=prefer sslcompression=0
> > > sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2
> > > gssencmode=disable krbsrvname=postgres gssdelegation=0
> > > target_session_attrs=any load_balance_hosts=disable", '\000'
> > >
> > > So we can try appending our default dbname to this. But all the
> > > defaults loaded in WalRcv->conninfo need some careful analysis to
> > > figure out if they work for slot-sync worker case.
> > >
> > > 2.2) Now coming to 'Which default db':
> > >
> > > 2.2.1) If we use 'template1' as default db, it may block 'create db'
> > > operations on primary for the time when the slot-sync worker is
> > > connected to remote using this dbname. Example:
> > >
> > > postgres=# create database newdb1;
> > > ERROR: source database "template1" is being accessed by other users
> > > DETAIL: There is 1 other session using the database.
> > >
> > > 2.2.2) If we use 'postgres' as default db, there are chances that it
> > > can be dropped as unlike 'template1', it is allowed to be dropped by
> > > user, and if slotsync worker is connected to it, user may see:
> > > newdb1=# drop database postgres;
> > > ERROR: database "postgres" is being accessed by other users
> > > DETAIL: There is 1 other session using the database.
> > >
> > > But once the slot-sync worker or standby goes down, user can always
> > > drop this and next time slot-sync worker may not be able to come up.
> > >
> >
> > Other random ideas for discussion are:
> >
> > 3) The slotsync worker uses primary_conninfo but also uses a new GUC
> > parameter, say slot_sync_dbname, to specify the database to connect.
> > The slot_sync_dbname overwrites the dbname if primary_conninfo also
> > specifies it. If both don't have a dbname, raise an error.
> >
>
> Would the users prefer to provide a value for a separate GUC instead
> of changing primary_conninfo? It is possible that we can have some
> users prefer to use one GUC and others prefer a separate GUC but we
> should add a new GUC if we are sure that is what users would prefer.
> Also, even if have to consider this option, I think we can easily
> later add a new GUC to provide a dbname in addition to having the
> provision of giving it in primary_conninfo.

I think having two separate GUCs is more flexible for example when
users want to change the dbname to connect. It makes sense that the
slotsync worker wants to use the same connection string as the
walreceiver uses. But I guess today most primary_conninfo settings
that are set manually or are generated by tools such as pg_basebackup
don't have dbname. If we require a dbname in primary_conninfo, many
tools will need to be changed. Once the connection string is
generated, it would be tricky to change the dbname in it, as Shveta
mentioned. The users will have to carefully select the database to
connect when taking a base backup.

>
> Also, I think having a separate GUC for dbanme has some complexity in
> terms of appending the dbname to primary_conninfo as pointed out by
> Shveta.

I think we don't necessarily need to append the dbname to the
connection string in order to specify/change the database to connect.
PQconnectdbParams() overrides the database name to connect if the
dbname parameter appears twice in the connection keyword. The
documentation[1] says:

When expand_dbname is non-zero, the value for the first dbname key
word is checked to see if it is a connection string. If so, it is
“expanded” into the individual connection parameters extracted from
the string. The value is considered to be a connection string, rather
than just a database name, if it contains an equal sign (=) or it
begins with a URI scheme designator. (More details on connection
string formats appear in Section 33.1.1.) Only the first occurrence of
dbname is treated in this way; any subsequent dbname parameter is
processed as a plain database name.

In general the parameter arrays are processed from start to end. If
any key word is repeated, the last value (that is not NULL or empty)
is used. This rule applies in particular when a key word found in a
connection string conflicts with one appearing in the keywords array.
Thus, the programmer may determine whether array entries can override
or be overridden by values taken from a connection string. Array
entries appearing before an expanded dbname entry can be overridden by
fields of the connection string, and in turn those fields are
overridden by array entries appearing after dbname (but, again, only
if those entries supply non-empty values).

If the slotsync worker needs to use libpqwalreceiver to connect the
primary, we will need to change libpqrcv_connect(). But we have the
infrastructure to change the database name to connect without changing
the connection string, at least.

>
> > 4) The slotsync worker uses a new GUC parameter, say
> > slot_sync_conninfo, to specify the connection string to the primary
> > aside from primary_conninfo. And pg_basebackup -R generates
> > slot_sync_conninfo as well if required (new option required).
> >
>
> Yeah, this is worth considering but won't slot_sync_conninfo be mostly
> a duplicate of primary_conninfo apart from dbname? I am not sure if
> the benefit outweighs the disadvantage of having mostly similar
> information in two GUCs.

Agreed.

Regards,

[1] https://www.postgresql.org/docs/devel/libpq-connect.html#LIBPQ-PQCONNECTDBPARAMS

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-12-29 01:47:39 Re: Synchronizing slots from primary to standby
Previous Message Andy Fan 2023-12-29 01:25:09 Re: Removing const-false IS NULL quals and redundant IS NOT NULL quals