| From: | "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com> | 
|---|---|
| To: | Greg Nancarrow <gregn4422(at)gmail(dot)com> | 
| Cc: | David Steele <david(at)pgmasters(dot)net>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dave Cramer <pg(at)fastcrypt(dot)com>, Jing Wang <jingwangian(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Elvis Pranskevichus <elprans(at)gmail(dot)com> | 
| Subject: | RE: Libpq support to connect to standby server as priority | 
| Date: | 2020-08-11 01:38:08 | 
| Message-ID: | f6daf26354054924998670cf03770352@G06AUEXCASMHB04.g06.fujitsu.local | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi Greg,
I have spent some time reading this discussion thread, and doing a code review of the latest (v17-0001) patch.
Below are my review comments; some are trivial, others not so much.
====
GENERAL COMMENT 1 ("any")
"any" should be included as valid option for target_server_type.
IIUC target_server_type was added mostly to have better alignment with JDBC options.
Both Vladimir [1] and Dave [2] already said that JDBC does have an "any" option.
[1] - https://www.postgresql.org/message-id/CAB%3DJe-FwOVE%3D8gR1UDDZRnWZR65fRG40e8zW_U_6mnUqbce68g%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CADK3HHJ9316ji7L-97cJBY%3Dwp4E3ddPMn8XdkNz6j8d9u0OhmQ%40mail.gmail.com
Furthermore, the fe-connect.c function makeEmptyPGConn sets default: 
+	conn->requested_server_type = SERVER_TYPE_ANY;
This means the default type of target_server_type is "any".
Since this is default, it should also be possible to assign the same value to explicitly.
(Parts of the v17 patch affected by this are itemised below)
====
GENERAL COMMENT 2 (Removal of pg_is_in_recovery)
Around 22/3/2019 Hari added a lot of pg_is_in_recovery code in his patch 0006 [1]
[1] - https://www.postgresql.org/message-id/CAJrrPGd4YeA%2BN%3DxC%2B1XPVoGzMCATJZY4irVQEJ6i0aPqorUi7g%40mail.gmail.com
Much later IIUC the latest v17 patch has taken onboard the recommendation from Alvaro and removed all that code:
"I would discard the whole thing about checking "SELECT pg_is_in_recovery()"" [2]
[2] - https://www.postgresql.org/message-id/20191227130828.GA21647%40alvherre.pgsql
However, it seems that not ALL parts of the original code got cleanly removed in v17.
There are a number of references to CONNECTION_CHECK_RECOVERY and pg_is_in_recovery still lurking.
(Parts of the v17 patch affected by this are itemised below)
====
COMMENT libpq.sgml (para blocks)
+ <para>
The v17 patch for target_session_attrs and target_server_type help is currently using <para> blocks for each of the possible option values. 
This format is inconsistent document style with other variables in this SGML. 
Other places are using sub-lists for option values. e.g. look at "six modes" of sslmode.
====
COMMENT libpq.sgml (cut/paste parameter description)
I don't think that target_server_type help should be just a cut/paste duplicate of target_session_attrs. It is confusing because it leaves the reader doubting the purpose of having such a duplication.
Suggest to simplify the target_server_type help like as follows:
--
target_server_type
The purpose of this parameter is to reflect the similar PGJDBC targetServerType.
The supported options are same as target_session_attrs.
This parameter overrides any connection type specified by target_session_attrs.
--
====
COMMENT libpq.sgml (pg_is_in_recovery)
(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)
+       <para>
+        If this parameter is set to <literal>standby</literal>, only a connection in which
+        the server is in recovery mode is considered acceptable. If the server is prior to version 14,
+        the query <literal>SELECT pg_is_in_recovery()</literal> will be sent upon any successful
+        connection; if it returns <literal>t</literal>, it means the server is in recovery mode.
+       </para>
Suggest change to:
--
If this parameter is set to <literal>standby</literal>, only a connection in which the server is in recovery mode is considered acceptable. The recovery mode state is determined by the value of the in_recovery configuration parameter that is reported by the server upon successful connection. Otherwise, if the server is prior to version 14, only a connection in which read-write transactions are not accepted by default is considered acceptable. To determine whether the server supports read-write transactions, the query SHOW transaction_read_only will be sent upon any successful connection; if it returns on, it means the server doesn't support read-write transactions.
--
====
COMMENT libpq.sgml (Oxford comma)
+       <varname>integer_datetimes</varname>,
+       <varname>standard_conforming_strings</varname> and
+       <varname>in_recovery</varname>.
Previously there was an Oxford comma (e.g. before the "and"). Now there isn't.
The v17 patch should not alter the previous listing style.
====
COMMENT protocol.sgml (Oxford comma)
+    <varname>integer_datetimes</varname>,
+    <varname>standard_conforming_strings</varname> and
+    <varname>in_recovery</varname>.
Previously there was an Oxford comma (e.g. before the "and"). Now there isn't.
The v17 patch should not alter the previous listing style.
====
QUESTION standby.c - SendRecoveryExitSignal
+/*
+ * SendRecoveryExitSignal
+ *		Signal backends that the server has exited recovery mode.
+ */
+void
+SendRecoveryExitSignal(void)
+{
+	SendSignalToAllBackends(PROCSIG_RECOVERY_EXIT);
+}
I wonder if this function is really necessary?
IIUC the SendRecoveryExitSignal is only called from one place (xlog.c).
Why not just call SendSignalToAllBackends directly from there and remove this extra layer?
====
COMMENT postgres.c (signal comment)
+	/* signal that work needs to be done */
+	recoveryExitInterruptPending = true;
Suggest change comment to say: 
/* flag that work needs to be done */
====
COMMENT fe-connect.c (sizeof)
-		"Target-Session-Attrs", "", 11, /* sizeof("read-write") = 11 */
+		"Target-Session-Attrs", "", 15, /* sizeof("prefer-standby") = 15 */
According to the SGML "prefer-secondary" is also an acceptable value for target_session_attrs, so the display field width should be 17 /* sizeof("prefer-secondary") */ not 15.
====
COMMENT fe-connect.c (CONNECTION_CHECK_RECOVERY)
@@ -2310,6 +2461,7 @@ PQconnectPoll(PGconn *conn)
 		case CONNECTION_CHECK_WRITABLE:
 		case CONNECTION_CONSUME:
 		case CONNECTION_GSS_STARTUP:
+		case CONNECTION_CHECK_RECOVERY:
 			break;
(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)
Probably this CONNECTION_CHECK_RECOVERY case should be removed.
====
COMMENT fe-connect.c - function validateAndRecordTargetServerType
As noted in GENERAL COMMENT 1, I suggest "any" needs to be included in this function as a valid option.
====
COMMENT fe-connect.c (target_session_attrs validation)
@@ -1396,8 +1425,9 @@ connectOptions2(PGconn *conn)
 	 */
 	if (conn->target_session_attrs)
 	{
-		if (strcmp(conn->target_session_attrs, "any") != 0
-			&& strcmp(conn->target_session_attrs, "read-write") != 0)
+		if (strcmp(conn->target_session_attrs, "any") == 0)
+			conn->requested_server_type = SERVER_TYPE_ANY;
+		else if (!validateAndRecordTargetServerType(conn->target_session_attrs, &conn->requested_server_type))
I suggest introducing a 2nd function for target_session_attrs (e.g. validateAndRecordTargetSessionAttrs). 
Even though these parameters are functionally the same today, in future they may not be [1].
[1] - https://www.postgresql.org/message-id/20200109152539.GA29017%40alvherre.pgsql
Regardless, the special "any" handling can be removed from here because (from GENERAL COMMENT 1) the validateAndRecordTargetServerType should now accept "any".
====
COMMENT fe-connect.c (message typo)
Found an existing typo, unrelated to the v17 patch.
"target_settion_attrs", --> "target_session_attrs",
====
COMMENT fe-connect.c (libpq_gettext)
+			printfPQExpBuffer(&conn->errorMessage,
+							  libpq_gettext("invalid target_server_type value: \"%s\"\n"),
+							  conn->target_server_type);
The parameter name "target_server_type" should be separated from the format string as "%s", the same as is done by the libpq_gettext of the preceding code.
====
COMMENT fe-connect.c (indentation)
+				goto error_return;
+			}
 		}
+		else
 		conn->whichhost++;
Bad indentation of the else's statement.
====
COMMENT fe-connect.c (if/else complexity)
+					else if ((conn->in_recovery &&
+						  conn->requested_server_type == SERVER_TYPE_PRIMARY) ||
+						 (!conn->in_recovery &&
+						  (conn->requested_server_type == SERVER_TYPE_PREFER_STANDBY ||
+						   conn->requested_server_type == SERVER_TYPE_STANDBY)))
+					{
TBH I was unable to read this code without first drawing up a matrix of combinations to deduce what was going on. 
It should not be so inscrutable.
Suggestion1:
Consider putting a large comment at the top of this CONNECTION_CHECK_TARGET to give the overview what this code is trying to acheive. 
e.g. something like this:
---
Mode           |in_recovery |version < 7.4      |version < 14               |version >= 14
---------------+------------+-------------------+---------------------------+-------------
ANY            |NA          |OK                 |OK                         |OK
PRIMARY        |true        |OK                 |SHOW transaction_read_only |keep_going
PRIMARY        |false       |OK                 |SHOW transaction_read_only |OK
PREFER_STANDBY |true        |keep_going (or -2)	|SHOW transaction_read_only	|OK
PREFER_STANDBY |false       |keep_going (or -2) |SHOW transaction_read_only	|keep_going (or -2)
STANDBY        |true        |keep_going         |SHOW transaction_read_only	|OK
STANDBY        |false       |keep_going         |SHOW transaction_read_only	|keep_going
---
Suggestion2:
Consider to separate out the requested_server_type cases instead of trying to hand everything in the same else block. The code may be a bit longer, but by aligning it more closely with the SGML documentation it can be made easier to understand.
e.g. something like this:
---
if (conn->requested_server_type == SERVER_TYPE_PRIMARY) {
	/* If not-in-recovery, reject, else OK. */
	if (conn->in_recovery) {
		rejectCheckedRecoveryConnection(conn);
		goto keep_going;
	}
	goto consume_checked_target_connection;
}
 
if (conn->requested_server_type == SERVER_TYPE_STANDBY) {
	/* Only a connection in recovery mode is acceptable. */
	if (!conn->in_recovery) {
		rejectCheckedRecoveryConnection(conn);
		goto keep_going;
	}
	goto consume_checked_target_connection;
}
 
if (conn->requested_server_type == SERVER_TYPE_PREFER_STANDBY) {
	/* A connection in recovery mode is preferred. */
	if (conn->in_recovery)
		goto consume_checked_target_connection;
 
	/*
	 * The following scenario is possible only for the
	 * prefer-standby type for the next pass of the list
	 * of connections, as it couldn't find any servers that
	 * are in recovery.
	 */
	if (conn->which_rw_host == -2)
		goto consume_checked_target_connection;
 
	/* reject function below remembers this r/w host index in case it is needed later */
	rejectCheckedRecoveryConnection(conn);
	goto keep_going;
}
---
====
COMMENT fe-connect.c (case CONNECTION_CHECK_RECOVERY)
(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)
v17 patch has removed the previous call to pg_is_in_recovery.
IIUC this means that there is currently no way for the remaining CONNECTION_CHECK_RECOVERY case to even be executed.
If I am correct, then a significant slab of code (~100 lines) can be deleted.
See case CONNECTION_CHECK_RECOVERY (lines ~ 4007 thru 4110)
====
COMMENT fe-connect.c - function freePGConn (missing free?)
There is code to free(conn->target_session_attrs), but there is no code to free target_server_type. 
Appears to be accidental omission.
====
COMMENT fe-exec.c (altered comment)
-	 * Special hacks: remember client_encoding and
+	 * Special hacks: remember client_encoding, and
A comma was added. 
Suggest avoid altering comments not directly related to the v17 patch logic.
====
COMMENT libpq-fe.h (CONNECTION_CHECK_RECOVERY)
+	CONNECTION_CHECK_TARGET,	/* Check if we have a proper target connection */
+	CONNECTION_CHECK_RECOVERY	/* Check whether server is in recovery */
(As noted in GENERAL COMMENT 2 there are still residual references to pg_is_in_recovery)
Probably this CONNECTION_CHECK_RECOVERY case should be removed.
====
Kind Regards,
Peter Smith
---
Fujitsu Australia
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2020-08-11 02:10:08 | Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks | 
| Previous Message | James Coleman | 2020-08-11 01:26:26 | Re: remove spurious CREATE INDEX CONCURRENTLY wait |