Re: libpq debug log

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: "iwata(dot)aya(at)fujitsu(dot)com" <iwata(dot)aya(at)fujitsu(dot)com>, "'pgsql-hackers(at)lists(dot)postgresql(dot)org'" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq debug log
Date: 2021-01-21 21:52:47
Message-ID: 20210121215247.GA31809@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Jan-17, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:

> * I don't see the need for separate pqTraceInit() function, because it is only called here.

That's true, but it'd require that we move PQtrace() to fe-misc.c,
because pqTraceInit() uses definitions that are private to that file.

> (2)
> +bool
> +pqTraceInit(PGconn *conn)
> +{
> + /* already done? */
> + if (conn->be_msg != NULL)
> + {
>
> What's this code for? I think it's sufficient that PQuntrace() frees b_msg and PQtrace() always allocates b_msg.

The point is to be able to cope with a connection that calls PQtrace()
multiple times, with or without an intervening PQuntrace().
We'd make no friends if we made PQtrace() crash, or leak memory if
called a second time ...

> (5)
> @@ -966,10 +966,6 @@ pqSaveParameterStatus(PGconn *conn, const char *name, const char *value)
> pgParameterStatus *pstatus;
> pgParameterStatus *prev;
>
> - if (conn->Pfdebug)
> - fprintf(conn->Pfdebug, "pqSaveParameterStatus: '%s' = '%s'\n",
> - name, value);
> -
>
> Where is this information output instead?

Hmm, seems to have been lost. I restored it, but didn't verify
the resulting behavior carefully.

> (9)
> + currtime = time(NULL);
> + tmp = localtime(&currtime);
> + strftime(timestr, sizeof(timestr), "%Y-%m-%d %H:%M:%S %Z", tmp);
> +
> + fprintf(conn->Pfdebug, "%s %s ", timestr, message_direction);
>
> It's better to have microsecond accuracy, because knowing the accumulation of such fine-grained timings may help to troubleshoot heavily-loaded client cases. You can mimic setup_formatted_log_time() in src/backend/utils/error/elog.c. This is used for the %m marker in log_line_prefix.

Yeah, it was me that removed printing of milliseconds -- Iwata-san's
original patch used ftime() which is not portable. I had looked for
portable ways to get this but couldn't find anything that didn't involve
linking backend files, so I punted. But looking again now, it is quite
simple: just use standard strftime and localtime and just concatenate
both together. Similar to what setup_formatted_log_time except we don't
worry about the TZ at all.

> (10)
> I think it's better to use tabs (or any other character that is less likely to appear in the log field) as a field separator, because it makes it easier to process the log with a spreadsheet or some other tools.

I can buy that. I changed it to have some tabs; the lines are now:

timestamp "> or <" <tab> "message type" <tab> length <space> message contents

I think trying to apply tabs inside the message contents is going to be
more confusing than helpful.

> (12)
> [...]
> Plus, you may as well print the invalid message type. Why don't you do something like this:
>
> +static const char *
> +pqGetProtocolMsgType(unsigned char c, PGCommSource commsource)
> +{
> + static char unknown_message[64];
> + char *msg = NULL;
>
> + if (commsource == FROM_BACKEND && c < lengthof(protocol_message_type_b))
> + msg = protocol_message_type_b[c];
> + else if (commsource == FROM_FRONTEND && c < lengthof(protocol_message_type_f))
> + msg = protocol_message_type_f[c];
> +
> + if (msg == NULL)
> + {
> + sprintf(unknown_message, "Unknown message %02x", c);
> + msg = unknown_message;
> + }
> +
> + return msg;
> +}

Right. I had written something like this, but in the end decided not to
bother because it doesn't seem terribly important. You can't do exactly
what you suggest, because it has the problem that you're returning a
stack-allocated variable even though your stack is about to be blown
away. My copy had a static buffer that was malloc()ed on first use (and
if allocation fails, return a const string). Anyway, I fixed the
crasher problem.

The protocol_message_type_b array had the serious problem it confused
the entry for byte 1 (0x01) with that of char '1' (and 2 and 3), so it
printed a lot of 'Unknown message' lines. Clearly, the maintenance of
this array is going to be a pain point of this patch (counting number of
zeroes is no fun), and I think we're going to have some way to have an
explicit initializer, where we can do things like
protocol_message_type_b['A'] = "NotificationResponse";
etc instead of the current way, which is messy, hard to maintain.
I'm not sure how to do that and not make things worse, however.

> (16)
> +static void
> +pqLogMessageByte1(PGconn *conn, char v, PGCommSource commsource)
> +{
> + char *message_direction = commsource == FROM_BACKEND ? "<" : ">";
> ...
> + switch (conn->be_msg->state)
>
> This function handles messages in both directions. But the switch condition is based on the backend message state (b_msg). How is this correct?

It's not.

I split things so that this function only prints backend messages; when
frontend messages are to be printed, we use a single fprintf() instead.
See about (20), below.

>
> (17)
> What's the difference between pqLogMsgString() and pqLogMsgnchar()? They both take a length argument. Do they have to be separate functions?

nchar are not null-terminated and we escape !isprint() characters. I'm
not sure that the difference is significant enough, but I'm also not
sure if they can be merged into one.

> (18)
> if (conn->Pfdebug)
> - fprintf(conn->Pfdebug, "To backend> %c\n", c);
> + pqStoreFrontendMsg(conn, LOG_BYTE1, 1);
>
> To match the style for backend messages with pqLogMsgByte1 etc., why don't you wrap the conn->Pfdebug check in the macro?

I think these macros are useless and should be removed. There's more
verbosity and confusion caused by them, than the clarity they provide.

> (20)
> +static void
> +pqLogFrontendMsg(PGconn *conn)
> ...
> + for (i = 0; i < conn->fe_msg->field_num; i++)
> + {
> + message_addr = conn->fe_msg->fields[i].offset_in_buffer;
> + length = conn->fe_msg->fields[i].length;
> +
> + switch (conn->fe_msg[i].type)
> + {
> + case LOG_BYTE1:
>
> When I saw this switch condition, the confusion exploded. Shouldn't the type should be the attribute of each field as follows?
>
> + switch (conn->fe_msg->fields[i].type)
>
> I also found the function names for frontend -> backend messages hard to grasp. Could you show the flow of function calls when sending messages to the backend?

Exactly! I was super confused about this too, and eventually decided to
rewrite the whole thing so that the 'type' is in the Fields struct. And
while doing that, I noticed that it would make more sense to let the
fe_msg array be realloc'ed if it gets full, rather than making it fixed
size. This made me add pqTraceUninit(), so that we can free the array
if it has grown, to avoid reserving arbitrarily large amounts of memory
after PQuntrace() on a session that traced large messages.

> (22)
> I couldn't understand pqLogBinaryMsg() at all. Could you explain what it does? Particularly, if all bytes in the string is printable, the function seems to do nothing:
>
> + if (isprint(v[i]))
> + continue;
>
> Should the following part be placed outside the for loop? What does 'pin' mean? (I want the variable name to be more intuitive.)
>
> + if (pin < length)
> + fwrite(v + pin, 1, length - pin, conn->Pfdebug);

Yes :-( I fixed that too.

This patch needs more work still, of course.

--
Álvaro Herrera Valdivia, Chile
"Java is clearly an example of money oriented programming" (A. Stepanov)

Attachment Content-Type Size
v12-0001-libpq-trace.patch text/x-diff 28.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-01-21 21:54:04 Re: Avoiding smgrimmedsync() during nbtree index builds
Previous Message Justin Pryzby 2021-01-21 21:26:51 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly