RE: libpq debug log

From: "iwata(dot)aya(at)fujitsu(dot)com" <iwata(dot)aya(at)fujitsu(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: libpq debug log
Date: 2021-03-17 15:06:14
Message-ID: TY2PR01MB1963128A22DF1E0D81A89F93EA6A9@TY2PR01MB1963.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tsunakawa san, Alvaro san,

Thank you very much for your review. It helped me a lot to make the patch better.
I update patch to v26.
This patch has been updated according to Tsunakawa san and Alvaro san review comments.

The output is following;
```
2021-03-17 14:43:16.411238 > Terminate 4
2021-03-17 14:43:16.441775 > Query 155 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (some_nonexistent_parameter = true);"
2021-03-17 14:43:16.442935 < ErrorResponse 124 S "ERROR" V "ERROR" C "22023" M "unrecognized parameter "some_nonexistent_parameter"" F "reloptions.c" L "1447" R "parseRelOptionsInternal" \x00 "Z"
2021-03-17 14:43:16.442977 < ReadyForQuery 5 I
2021-03-17 14:43:16.443042 > Query 144 "CREATE TABLESPACE regress_tblspacewith LOCATION '/home/iwata/pgsql/postgres/src/test/regress/testtablespace' WITH (random_page_cost = 3.0);"
2021-03-17 14:43:16.444774 < CommandComplete 22 "CREATE TABLESPACE"
2021-03-17 14:43:16.444807 < ReadyForQuery 5 I
2021-03-17 14:43:16.444878 > Query 81 "SELECT spcoptions FROM pg_tablespace WHERE spcname = 'regress_tblspacewith';"
2021-03-17 14:43:16.448117 < RowDescription 35 1 "spcoptions" 1213 5 1009 65535 -1 0
2021-03-17 14:43:16.448157 < DataRow 32 1 22 '{random_page_cost=3.0}'
2021-03-17 14:43:16.448171 < CommandComplete 13 "SELECT 1"
```

> -----Original Message-----
> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa(dot)takay(at)fujitsu(dot)com>
> Sent: Tuesday, March 16, 2021 12:03 PM


> (1)
> - Enables tracing of the client/server communication to a debugging
> file stream.
> + Enables tracing of the client/server communication to a debugging file
> + stream.
> + Only available when protocol version 3 or higher is used.
>
> The last line is unnecessary now that v2 is not supported.

I removed last comment from documentation file.

> (2)
> @@ -113,6 +114,7 @@ install: all installdirs install-lib
> $(INSTALL_DATA) $(srcdir)/libpq-fe.h '$(DESTDIR)$(includedir)'
> $(INSTALL_DATA) $(srcdir)/libpq-events.h '$(DESTDIR)$(includedir)'
> $(INSTALL_DATA) $(srcdir)/libpq-int.h
> '$(DESTDIR)$(includedir_internal)'
> + $(INSTALL_DATA) $(srcdir)/libpq-trace.h
> '$(DESTDIR)$(includedir_internal)'
> $(INSTALL_DATA) $(srcdir)/pqexpbuffer.h
> '$(DESTDIR)$(includedir_internal)'
> $(INSTALL_DATA) $(srcdir)/pg_service.conf.sample
> '$(DESTDIR)$(datadir)/pg_service.conf.sample'
>
> Why is this necessary?

I think it is not necessary. I removed it.

> (3) libpq-trace.h
> +#ifdef __cplusplus
> +extern "C"
> +{
>
> This is unnecessary because pqTraceOutputMessage() is for libpq's internal
> use. This extern is for the user's C++ application to call public libpq
> functions.
>
> +#include "libpq-fe.h"
> +#include "libpq-int.h"
>
> I think these can be removed by declaring the struct and function as follows:
>
> struct pg_conn;
> extern void pqTraceOutputMessage(struct pg_conn *conn, const char
> *message, bool toServer);
>
> But... after doing the above, only this declaration is left in libpq-trace.h. Why
> don't you put your original declaration using PGconn in libpq-int.h and remove
> libpq-trace.h?

I remove this file.
I was wondering whether to delete this file during the development of v25 patch. I leave it because it keep scalability.
If tracing process become update and it have a lot of extern function, leave this header file is meaningful.
However I think it does not happen. So I remove it.

> (4)
> + /* Trace message only when there is first 1 byte */
> + if (conn->Pfdebug && (conn->outCount < conn->outMsgStart))
> + pqTraceOutputMessage(conn, conn->outBuffer +
> conn->outCount, true);
>
> () surrounding the condition after && can be removed.

I removed this (). And This if () statement has been modified according to the review comment (14).

> (5)
> +static const char *pqGetProtocolMsgType(unsigned char c,
> +
> bool toServer);
>
> This is unnecessary because the function definition precedes its only one call
> site.

Yes, I removed this definition.

> (6)
> + * We accumulate frontend message pieces in an array as the libpq code
> + writes
> + * them, and log the complete message when pqTraceOutputFeMsg is called.
> + * For backend, we print the pieces as soon as we receive them from the
> server.
>
> The first paragraph is no longer true. I think this entire comment is
> unnecessary.

I removed this explanation.

> (7)
> +static const char *
> +pqGetProtocolMsgType(unsigned char c, bool toServer) {
> + if (toServer == true &&
> + c < lengthof(protocol_message_type_f) &&
> + protocol_message_type_f[c] != NULL)
> + return protocol_message_type_f[c];
> +
> + if (toServer == false &&
> + c < lengthof(protocol_message_type_b) &&
> + protocol_message_type_b[c] != NULL)
> + return protocol_message_type_b[c];
> +
> + return "UnknownMessage:";
> +}
>
> "== true/false" can be removed. libpq doesn't appear to use such style.

Yes, I removed it.

>
> Why don't we embed this processing in pqTraceOutputMessage() because this
> function is short and called only once? The added benefit would be that you
> can print an invalid message type byte like this:
>
> fprintf(..., "Unknown message: %02x\n", ...);

Sure. I fixed this message output.
And I removed switch-case default fprintf which output `Invalidate Protocol`.
Because Unknown...message covers this message meaning.

...
> (8)
> + char id = '\0';
>
> This initialization is not required because id will always be assigned a value
> shortly.

I see, I removed this initialization.

> (9)
> +static int
> +pqTraceOutputInt32(const char *data, FILE *pfdebug) {
> + int result;
> +
> + memcpy(&result, data, 4);
> + result = (int) pg_ntoh32(result);
> + fputc(' ', pfdebug);
> + fprintf(pfdebug, "%d", result);
> +
> + return result;
> +}
>
> fputc() and fprintf() can be merged into one fprintf() call for efficiency.

The reason I separated this is because I thought it would be easy to handle any changes
that I wanted to make, such as changing spaces to tabs.
However this appear only 5 times. So I merged fprintf and fputc.

> (10)
> +/* BackendKeyData */
> +static void
> +pqTraceOutputK(const char *message, FILE *f) {
> + int cursor = 0;
> +
> + pqTraceOutputInt32(message + cursor, f);
> + cursor += 4;
> + pqTraceOutputInt32(message + cursor, f); }
>
> I don't think you need to always use a cursor variable in order to align with
> more complex messages. That is, you can just write:
>
> + pqTraceOutputInt32(message, f);
> + pqTraceOutputInt32(message + 4, f);

To follow Alvaro san's suggestion which is * number 4,
I left this cursor.

> (11)
> + default:
> + fprintf(conn->Pfdebug, "Invalid Protocol:%c\n", id);
> + break;
> +
>
> (This is related to (7))
> You can remove this default label if you exit the function before the switch
> statement when the message type is unknown. Make sure to output \n in
> that case.

I removed the default.

> (12) pqTraceOutputB
> + for (i = 0; i < nparams; i++)
> + {
> + nbytes = pqTraceOutputInt32(message + cursor, f);
> + cursor += 4;
> + if (nbytes == -1)
> + break;
> + pqTraceOutputNchar(message + cursor, f, nbytes);
> + cursor += nbytes;
> + }
>
> Not break but continue, because non-NULL parameters may follow a NULL
> parameter.

I changed break to continue.

> (13) pqTraceOutputB
> + pqTraceOutputInt16(message + cursor, f);
> + cursor += 4;
> + pqTraceOutputInt16(message + cursor, f); }
>
> This part doesn't seem to match the following description.
>
> ----------
> After the last parameter, the following fields appear:
>
> Int16
> The number of result-column format codes that follow (denoted R below).
> This can be zero to indicate that there are no result columns or that the result
> columns should all use the default format (text); or one, in which case the
> specified format code is applied to all result columns (if any); or it can equal
> the actual number of result columns of the query.
>
> Int16[R]
> The result-column format codes. Each must presently be zero (text) or one
> (binary).
> ---------

Thank you. I fixed it to use for loop.

> (14)
> The processing for CancelRequest message is missing?

This message does not have first 1 byte. So I remove it from my patch.
I create new function pqTraceOutputNoTypeByteMessage() and prepare it in the function.

> (15)
> +/* CopyInResponse */
> +static void
> +pqTraceOutputG(const char *message, int end, FILE *f) {
> + int cursor = 0;
> +
> + pqTraceOutputByte1(message + cursor, f);
> + cursor++;
> +
> + while (end > cursor)
> + {
> + pqTraceOutputInt16(message + cursor, f);
> + cursor += 2;
> + }
> +}
> +
>
> According to the following description, for loop should be used.
>
> ----------
> Int16
> The number of columns in the data to be copied (denoted N below).
>
> Int16[N]
> The format codes to be used for each column. Each must presently be zero
> (text) or one (binary). All must be zero if the overall copy format is textual.
> ----------

I fixed it.

> (16)
> (15) is also true for the processing of 'H' message.

Thank you. I fixed it.

> (17) pqTraceOutputD
> + for (i = 0; i < nfields; i++)
> + {
> + len = pqTraceOutputInt32(message + cursor, f);
> + cursor += 4;
> + if (len == -1)
> + break;
> + pqTraceOutputNchar(message + cursor, f, len);
> + cursor += len;
> + }
>
> Not break but continue, because non-NULL columns may follow a NULL
> column.

I fixed it.

> (18)
> + case 'E': /* Execute(F) or Error Return(B) */
> + pqTraceOutputE(message + LogCursor, LogEnd,
> conn->Pfdebug, toServer);
> + break;
>
> Error Return -> ErrorResponse

I fixed it.

> (19) pqTraceOutputF
> Check the protocol again. Two for loops should be required, one for format
> codes and another for arguments.

Thank you. I fixed it.

> (20)
> + if ( len != -1)
>
> Remove extra space before len.

I remove this space.

> (21)
> I guess you intentionally omitted the following messages because they are
> only used during connection establishment. I'm fine with it. I wrote this just
> in case you missed them.
>
> GSSENCRequest (F)
> Int32(8)
>
> GSSResponse (F)
> Byte1('p')
>
> PasswordMessage (F)
> Byte1('p')
>
> SASLInitialResponse (F)
> Byte1('p')
>
> SASLResponse (F)
> Byte1('p')

Yes, I think it does not appear.

> (22)
> +/* NotificationResponse */
> +static void
> +pqTraceOutputA(const char *message, int end, FILE *f) {
> + int cursor = 0;
> +
> + pqTraceOutputInt16(message + cursor, f);
> + cursor += 2;
> + pqTraceOutputString(message + cursor, f);
>
> Not Int16 but Int32.

I fixed it.

> From: Tsunakawa, Takayuki/綱川 貴之 <tsunakawa(dot)takay(at)fujitsu(dot)com>
> Sent: Wednesday, March 17, 2021 11:10 AM

> > * I would make the pqTraceOutputInt16() function and siblings advance
> > the cursor themselves, actually. I think something like this:
> > static int
> > pqTraceOutputInt16(const char *data, int *cursor, FILE *pfdebug)
> > {
> > uint16 tmp;
> > int result;
> >
> > memcpy(&tmp, data + *cursor, 2);
> > *cursor += 2;
> > result = (int) pg_ntoh16(tmp);
> > fprintf(pfdebug, " #%d", result);
> >
> > return result;
> > }
> > (So the caller has to pass the original "data" pointer, not
> > "data+cursor"). This means the caller no longer has to do "cursor+=N"
> > at each place. Also, you get rid of the moveStrCursor() which does
> > not look good.
>
> That makes sense, because in fact the patch mistakenly added 4 when it
> should add 2. Also, the code would become smaller.

I changed 5 message data type function this style.

> > * I'm not fond of the idea of prefixing "#" for 16bit ints and no
> > prefix for 32bit ints. Seems obscure and the output looks weird.
> > I would use a one-letter prefix for each type, "w" for 32-bit ints
> > (stands for "word") and "h" for 16-bit ints (stands for "half-word").
> > Message length goes unadorned. Then you'd have lines like this
> >
> > 2021-03-15 08:10:44.324296 < RowDescription 35 h1 "spcoptions"
> > w1213 h5 w1009 h65535 w-1 h0
> > 2021-03-15 08:10:44.324335 < DataRow 32 h1 w22
> > '{random_page_cost=3.0}'
>
> Yes, actually I felt something similar. Taking a second thought, I think we
> don't have to have any prefix because it doesn't help users. So we're
> removing the prefix. We don't have any opinion on adding some prefix.

Having only # can be confusing for users. So I removed #.

> > * I don't like that pqTraceOutputS/H print nothing when !toServer. I
> > think they should complain.
>
> Yes, the caller should not call the function if there's no message content.
> That way, the function doesn't need the toServer argument.

I moved pqTraceOutputS/H's `if(!toServer)` check to pqTraceOutputMessage() switch statement.

Regards,
Aya Iwata

Attachment Content-Type Size
v26-libpq-trace-log.patch application/octet-stream 28.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-03-17 15:15:43 Re: Getting better results from valgrind leak tracking
Previous Message Tom Lane 2021-03-17 14:48:27 Re: Calendar support in localization