RE: Libpq support to connect to standby server as priority

From: "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dave Cramer <pg(at)fastcrypt(dot)com>, Jing Wang <jingwangian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Elvis Pranskevichus <elprans(at)gmail(dot)com>
Subject: RE: Libpq support to connect to standby server as priority
Date: 2020-08-11 01:38:08
Message-ID: f6daf26354054924998670cf03770352@G06AUEXCASMHB04.g06.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Greg,

I have spent some time reading this discussion thread, and doing a code review of the latest (v17-0001) patch.

Below are my review comments; some are trivial, others not so much.

====

GENERAL COMMENT 1 ("any")

"any" should be included as valid option for target_server_type.

IIUC target_server_type was added mostly to have better alignment with JDBC options.
Both Vladimir [1] and Dave [2] already said that JDBC does have an "any" option.
[1] - https://www.postgresql.org/message-id/CAB%3DJe-FwOVE%3D8gR1UDDZRnWZR65fRG40e8zW_U_6mnUqbce68g%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CADK3HHJ9316ji7L-97cJBY%3Dwp4E3ddPMn8XdkNz6j8d9u0OhmQ%40mail.gmail.com

Furthermore, the fe-connect.c function makeEmptyPGConn sets default:
+ conn->requested_server_type = SERVER_TYPE_ANY;
This means the default type of target_server_type is "any".
Since this is default, it should also be possible to assign the same value to explicitly.

(Parts of the v17 patch affected by this are itemised below)

====

GENERAL COMMENT 2 (Removal of pg_is_in_recovery)

Around 22/3/2019 Hari added a lot of pg_is_in_recovery code in his patch 0006 [1]
[1] - https://www.postgresql.org/message-id/CAJrrPGd4YeA%2BN%3DxC%2B1XPVoGzMCATJZY4irVQEJ6i0aPqorUi7g%40mail.gmail.com

Much later IIUC the latest v17 patch has taken onboard the recommendation from Alvaro and removed all that code:
"I would discard the whole thing about checking "SELECT pg_is_in_recovery()"" [2]
[2] - https://www.postgresql.org/message-id/20191227130828.GA21647%40alvherre.pgsql

However, it seems that not ALL parts of the original code got cleanly removed in v17.
There are a number of references to CONNECTION_CHECK_RECOVERY and pg_is_in_recovery still lurking.

(Parts of the v17 patch affected by this are itemised below)

====

COMMENT libpq.sgml (para blocks)

+ <para>

The v17 patch for target_session_attrs and target_server_type help is currently using <para> blocks for each of the possible option values.
This format is inconsistent document style with other variables in this SGML.
Other places are using sub-lists for option values. e.g. look at "six modes" of sslmode.

====

COMMENT libpq.sgml (cut/paste parameter description)

I don't think that target_server_type help should be just a cut/paste duplicate of target_session_attrs. It is confusing because it leaves the reader doubting the purpose of having such a duplication.

Suggest to simplify the target_server_type help like as follows:
--
target_server_type
The purpose of this parameter is to reflect the similar PGJDBC targetServerType.
The supported options are same as target_session_attrs.
This parameter overrides any connection type specified by target_session_attrs.
--

====

COMMENT libpq.sgml (pg_is_in_recovery)

(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)

+ <para>
+ If this parameter is set to <literal>standby</literal>, only a connection in which
+ the server is in recovery mode is considered acceptable. If the server is prior to version 14,
+ the query <literal>SELECT pg_is_in_recovery()</literal> will be sent upon any successful
+ connection; if it returns <literal>t</literal>, it means the server is in recovery mode.
+ </para>

Suggest change to:
--
If this parameter is set to <literal>standby</literal>, only a connection in which the server is in recovery mode is considered acceptable. The recovery mode state is determined by the value of the in_recovery configuration parameter that is reported by the server upon successful connection. Otherwise, if the server is prior to version 14, only a connection in which read-write transactions are not accepted by default is considered acceptable. To determine whether the server supports read-write transactions, the query SHOW transaction_read_only will be sent upon any successful connection; if it returns on, it means the server doesn't support read-write transactions.
--

====

COMMENT libpq.sgml (Oxford comma)

+ <varname>integer_datetimes</varname>,
+ <varname>standard_conforming_strings</varname> and
+ <varname>in_recovery</varname>.

Previously there was an Oxford comma (e.g. before the "and"). Now there isn't.
The v17 patch should not alter the previous listing style.

====

COMMENT protocol.sgml (Oxford comma)

+ <varname>integer_datetimes</varname>,
+ <varname>standard_conforming_strings</varname> and
+ <varname>in_recovery</varname>.

Previously there was an Oxford comma (e.g. before the "and"). Now there isn't.
The v17 patch should not alter the previous listing style.

====

QUESTION standby.c - SendRecoveryExitSignal

+/*
+ * SendRecoveryExitSignal
+ * Signal backends that the server has exited recovery mode.
+ */
+void
+SendRecoveryExitSignal(void)
+{
+ SendSignalToAllBackends(PROCSIG_RECOVERY_EXIT);
+}

I wonder if this function is really necessary?
IIUC the SendRecoveryExitSignal is only called from one place (xlog.c).
Why not just call SendSignalToAllBackends directly from there and remove this extra layer?

====

COMMENT postgres.c (signal comment)

+ /* signal that work needs to be done */
+ recoveryExitInterruptPending = true;

Suggest change comment to say:
/* flag that work needs to be done */

====

COMMENT fe-connect.c (sizeof)

- "Target-Session-Attrs", "", 11, /* sizeof("read-write") = 11 */
+ "Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */

According to the SGML "prefer-secondary" is also an acceptable value for target_session_attrs, so the display field width should be 17 /* sizeof("prefer-secondary") */ not 15.

====

COMMENT fe-connect.c (CONNECTION_CHECK_RECOVERY)

@@ -2310,6 +2461,7 @@ PQconnectPoll(PGconn *conn)
case CONNECTION_CHECK_WRITABLE:
case CONNECTION_CONSUME:
case CONNECTION_GSS_STARTUP:
+ case CONNECTION_CHECK_RECOVERY:
break;

(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)

Probably this CONNECTION_CHECK_RECOVERY case should be removed.

====

COMMENT fe-connect.c - function validateAndRecordTargetServerType

As noted in GENERAL COMMENT 1, I suggest "any" needs to be included in this function as a valid option.

====

COMMENT fe-connect.c (target_session_attrs validation)

@@ -1396,8 +1425,9 @@ connectOptions2(PGconn *conn)
*/
if (conn->target_session_attrs)
{
- if (strcmp(conn->target_session_attrs, "any") != 0
- && strcmp(conn->target_session_attrs, "read-write") != 0)
+ if (strcmp(conn->target_session_attrs, "any") == 0)
+ conn->requested_server_type = SERVER_TYPE_ANY;
+ else if (!validateAndRecordTargetServerType(conn->target_session_attrs, &conn->requested_server_type))

I suggest introducing a 2nd function for target_session_attrs (e.g. validateAndRecordTargetSessionAttrs).
Even though these parameters are functionally the same today, in future they may not be [1].
[1] - https://www.postgresql.org/message-id/20200109152539.GA29017%40alvherre.pgsql

Regardless, the special "any" handling can be removed from here because (from GENERAL COMMENT 1) the validateAndRecordTargetServerType should now accept "any".

====

COMMENT fe-connect.c (message typo)

Found an existing typo, unrelated to the v17 patch.

"target_settion_attrs", --> "target_session_attrs",

====

COMMENT fe-connect.c (libpq_gettext)

+ printfPQExpBuffer(&conn->errorMessage,
+ libpq_gettext("invalid target_server_type value: \"%s\"\n"),
+ conn->target_server_type);

The parameter name "target_server_type" should be separated from the format string as "%s", the same as is done by the libpq_gettext of the preceding code.

====

COMMENT fe-connect.c (indentation)

+ goto error_return;
+ }
}
+ else
conn->whichhost++;

Bad indentation of the else's statement.

====

COMMENT fe-connect.c (if/else complexity)

+ else if ((conn->in_recovery &&
+ conn->requested_server_type == SERVER_TYPE_PRIMARY) ||
+ (!conn->in_recovery &&
+ (conn->requested_server_type == SERVER_TYPE_PREFER_STANDBY ||
+ conn->requested_server_type == SERVER_TYPE_STANDBY)))
+ {

TBH I was unable to read this code without first drawing up a matrix of combinations to deduce what was going on.
It should not be so inscrutable.

Suggestion1:
Consider putting a large comment at the top of this CONNECTION_CHECK_TARGET to give the overview what this code is trying to acheive.

e.g. something like this:
---
Mode |in_recovery |version < 7.4 |version < 14 |version >= 14
---------------+------------+-------------------+---------------------------+-------------
ANY |NA |OK |OK |OK
PRIMARY |true |OK |SHOW transaction_read_only |keep_going
PRIMARY |false |OK |SHOW transaction_read_only |OK
PREFER_STANDBY |true |keep_going (or -2) |SHOW transaction_read_only |OK
PREFER_STANDBY |false |keep_going (or -2) |SHOW transaction_read_only |keep_going (or -2)
STANDBY |true |keep_going |SHOW transaction_read_only |OK
STANDBY |false |keep_going |SHOW transaction_read_only |keep_going
---

Suggestion2:
Consider to separate out the requested_server_type cases instead of trying to hand everything in the same else block. The code may be a bit longer, but by aligning it more closely with the SGML documentation it can be made easier to understand.

e.g. something like this:
---
if (conn->requested_server_type == SERVER_TYPE_PRIMARY) {
/* If not-in-recovery, reject, else OK. */
if (conn->in_recovery) {
rejectCheckedRecoveryConnection(conn);
goto keep_going;
}
goto consume_checked_target_connection;
}

if (conn->requested_server_type == SERVER_TYPE_STANDBY) {
/* Only a connection in recovery mode is acceptable. */
if (!conn->in_recovery) {
rejectCheckedRecoveryConnection(conn);
goto keep_going;
}
goto consume_checked_target_connection;
}

if (conn->requested_server_type == SERVER_TYPE_PREFER_STANDBY) {
/* A connection in recovery mode is preferred. */
if (conn->in_recovery)
goto consume_checked_target_connection;

/*
* The following scenario is possible only for the
* prefer-standby type for the next pass of the list
* of connections, as it couldn't find any servers that
* are in recovery.
*/
if (conn->which_rw_host == -2)
goto consume_checked_target_connection;

/* reject function below remembers this r/w host index in case it is needed later */
rejectCheckedRecoveryConnection(conn);
goto keep_going;
}
---

====

COMMENT fe-connect.c (case CONNECTION_CHECK_RECOVERY)

(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)

v17 patch has removed the previous call to pg_is_in_recovery.
IIUC this means that there is currently no way for the remaining CONNECTION_CHECK_RECOVERY case to even be executed.

If I am correct, then a significant slab of code (~100 lines) can be deleted.
See case CONNECTION_CHECK_RECOVERY (lines ~ 4007 thru 4110)

====

COMMENT fe-connect.c - function freePGConn (missing free?)

There is code to free(conn->target_session_attrs), but there is no code to free target_server_type.
Appears to be accidental omission.

====

COMMENT fe-exec.c (altered comment)

- * Special hacks: remember client_encoding and
+ * Special hacks: remember client_encoding, and

A comma was added.
Suggest avoid altering comments not directly related to the v17 patch logic.

====

COMMENT libpq-fe.h (CONNECTION_CHECK_RECOVERY)

+ CONNECTION_CHECK_TARGET, /* Check if we have a proper target connection */
+ CONNECTION_CHECK_RECOVERY /* Check whether server is in recovery */

(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)

Probably this CONNECTION_CHECK_RECOVERY case should be removed.

====

Kind Regards,
Peter Smith
---
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-11 02:10:08 Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Previous Message James Coleman 2020-08-11 01:26:26 Re: remove spurious CREATE INDEX CONCURRENTLY wait