RE: libpq debug log

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: "iwata(dot)aya(at)fujitsu(dot)com" <iwata(dot)aya(at)fujitsu(dot)com>
Cc: "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: libpq debug log
Date: 2021-01-17 05:23:36
Message-ID: TYAPR01MB2990379993AFD6DE211E63B7FEA50@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: iwata(dot)aya(at)fujitsu(dot)com <iwata(dot)aya(at)fujitsu(dot)com>
> This patch includes these items:

Is there anything else in this revision that is not handled?

Below are my comments.

Also, why don't you try running the regression tests with a temporary modification to PQtrace() to output the trace to a file? The sole purpose is to confirm that this patch makes the test crash (core dump).

(1)
- conn->Pfdebug = debug_port;
+ if (pqTraceInit(conn))
+ {
+ conn->Pfdebug = debug_port;
+ if (conn->Pfdebug != NULL)
+ setlinebuf(conn->Pfdebug);
+ }
+ else
+ {
+ /* XXX report ENOMEM? */
+ fprintf(conn->Pfdebug, "Failed to initialize trace support\n");
+ fflush(conn->Pfdebug);
+ }
}

* When the passed debug_port is NULL, the function should return after calling PQuntrace().

* Is setlinebuf() available on Windows? Maybe you should use setvbuf() instead.

* I don't see the need for separate pqTraceInit() function, because it is only called here.

(2)
+bool
+pqTraceInit(PGconn *conn)
+{
+ /* already done? */
+ if (conn->be_msg != NULL)
+ {

What's this code for? I think it's sufficient that PQuntrace() frees b_msg and PQtrace() always allocates b_msg.

(3)
+ conn->fe_msg = malloc(sizeof(pqFrontendMessage) +
+ MAX_FRONTEND_MSGS * sizeof(pqFrontendMessageField));
+ conn->fe_msg->field_num = 0;
+ if (conn->fe_msg == NULL)
+ {
+ free(conn->be_msg);
+ /* NULL out for the case that fe_msg malloc fails */
+ conn->be_msg = NULL;
+ return false;
+ }
+ conn->fe_msg->field_num = 0;

The memory for the fields array is one entry larger than necessary. The allocation code would be:

+ conn->fe_msg = malloc(offsetof(pqFrontendMessage, fields) +
+ MAX_FRONTEND_MSGS * sizeof(pqFrontendMessageField));

Also, the line with "field_num = 0" appears twice. The first one should be removed.

(4)
+static void
+pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
+{
...
+ if (PG_PROTOCOL_MAJOR(conn->pversion) >= 3)
+ {

I think you can remove the protocol version check in various logging functions, because PQtrace() disables logging when the protocol version is 2.

(5)
@@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char *name, const char *value)
pgParameterStatus *pstatus;
pgParameterStatus *prev;

- if (conn->Pfdebug)
- fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n",
- name, value);
-

Where is this information output instead?

(6)
+ * Set up state so that we can trace. NB -- this might be called mutiple

mutiple -> multiple

(7)
+ LOG_FIRST_BYTE, /* logging the first byte identifing the
+ protocol message type */

identifing -> identifying

(8)
+#define pqLogMsgByte1(CONN, CH, SOURCE) \
+((CONN)->Pfdebug ? pqLogMessageByte1(CONN, CH, SOURCE) : 0)

For functions that return void, follow the style of CHECK_FOR_INTERRUPTS() defined in miscadmin.h.

(9)
+ currtime = time(NULL);
+ tmp = localtime(&currtime);
+ strftime(timestr, sizeof(timestr), "%Y-%m-%d %H:%M:%S %Z", tmp);
+
+ fprintf(conn->Pfdebug, "%s %s ", timestr, message_direction);

It's better to have microsecond accuracy, because knowing the accumulation of such fine-grained timings may help to troubleshoot heavily-loaded client cases. You can mimic setup_formatted_log_time() in src/backend/utils/error/elog.c. This is used for the %m marker in log_line_prefix.

(10)
I think it's better to use tabs (or any other character that is less likely to appear in the log field) as a field separator, because it makes it easier to process the log with a spreadsheet or some other tools.

(11)
+ /*
+ * True type of message tagged '\0' is known when next 4 bytes is
+ * checked. So we delay printing message type to pqLogMsgInt()
+ */
+ if (v != '\0')
+ fprintf(conn->Pfdebug, "%s ",
+ pqGetProtocolMsgType((unsigned char) v, commsource));

In what cases does '\0' appear as a message type?

(12)
+static const char *
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
+ if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
+ return protocol_message_type_b[c];
+ else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
+ return protocol_message_type_f[c];
+ else
+ return "UnknownMessage";
+}

This function returns NULL (=0) when protocol_message_type_b/f[c] is 0. That crashes the caller:

+ fprintf(conn->Pfdebug, "%s ",
+ pqGetProtocolMsgType((unsigned char) v, commsource));

Plus, you may as well print the invalid message type. Why don't you do something like this:

+static const char *
+pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
+{
+ static char unknown_message[64];
+ char *msg = NULL;

+ if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
+ msg = protocol_message_type_b[c];
+ else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
+ msg = protocol_message_type_f[c];
+
+ if (msg == NULL)
+ {
+ sprintf(unknown_message, "Unknown message %02x", c);
+ msg = unknown_message;
+ }
+
+ return msg;
+}

(13)
@@ -156,7 +156,12 @@ pqParseInput3(PGconn *conn)
{
/* If not IDLE state, just wait ... */
if (conn->asyncStatus != PGASYNC_IDLE)
+ {
+ /* Terminate a half-finished logging message */
+ if (conn->Pfdebug)
+ pqTraceForcelyBreakLine(msgLength, conn);
return;
+ }

/*
* Unexpected message in IDLE state; need to recover somehow.

What's this situation like? Why output a new line and reset the trace status?

(14)
+/* pqLogInvalidProtocol: Print that the protocol message is invalid */
+static void
+pqLogInvalidProtocol(PGconn *conn)
+{
+ fprintf(conn->Pfdebug, ":::Invalid Protocol\n");
+ conn->be_msg->state = LOG_FIRST_BYTE;
+}

Is it sufficient to just reset the state field? Isn't it necessary to call pqTraceResetBeMsg() instead?

(15)
@@ -212,15 +368,8 @@ pqSkipnchar(size_t len, PGconn *conn)
...
conn->inCursor += len;
-
return 0;
}

This is an unnecessary removal of an empty line.

(16)
+static void
+pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
+{
+ char *message_direction = commsource == FROM_BACKEND ? "<" : ">";
...
+ switch (conn->be_msg->state)

This function handles messages in both directions. But the switch condition is based on the backend message state (b_msg). How is this correct?

(17)
What's the difference between pqLogMsgString() and pqLogMsgnchar()? They both take a length argument. Do they have to be separate functions?

(18)
if (conn->Pfdebug)
- fprintf(conn->Pfdebug, "To backend> %c\n", c);
+ pqStoreFrontendMsg(conn, LOG_BYTE1, 1);

To match the style for backend messages with pqLogMsgByte1 etc., why don't you wrap the conn->Pfdebug check in the macro?

(19)
@@ -520,8 +667,6 @@ pqPutMsgStart(char msg_type, bool force_len, PGconn *conn)
/* allow room for message length */
endPos += 4;
}
- else
- lenPos = -1;

Why is this change necessary?

(20)
+static void
+pqLogFrontendMsg(PGconn *conn)
...
+ for (i = 0; i < conn->fe_msg->field_num; i++)
+ {
+ message_addr = conn->fe_msg->fields[i].offset_in_buffer;
+ length = conn->fe_msg->fields[i].length;
+
+ switch (conn->fe_msg[i].type)
+ {
+ case LOG_BYTE1:

When I saw this switch condition, the confusion exploded. Shouldn't the type should be the attribute of each field as follows?

+ switch (conn->fe_msg->fields[i].type)

I also found the function names for frontend -> backend messages hard to grasp. Could you show the flow of function calls when sending messages to the backend?

(21)
+ uint16 result16;
+ memcpy(&result16, conn->outBuffer + message_addr, length);

You should have an empty line between the declaration part and execution part.

(22)
I couldn't understand pqLogBinaryMsg() at all. Could you explain what it does? Particularly, if all bytes in the string is printable, the function seems to do nothing:

+ if (isprint(v[i]))
+ continue;

Should the following part be placed outside the for loop? What does 'pin' mean? (I want the variable name to be more intuitive.)

+ if (pin < length)
+ fwrite(v + pin, 1, length - pin, conn->Pfdebug);

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-01-17 05:29:54 Re: PoC/WIP: Extended statistics on expressions
Previous Message tsunakawa.takay@fujitsu.com 2021-01-17 05:16:09 RE: Disable WAL logging to speed up data loading