RE: libpq debug log

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "iwata(dot)aya(at)fujitsu(dot)com" <iwata(dot)aya(at)fujitsu(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: libpq debug log
Date: 2021-03-05 12:56:25
Message-ID: OSBPR01MB23415248EFC4BA8301DDC8FAEF969@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alvaro-san and Horiguchi-san,
CC) Iwata-san, Tsunakawa-san

Attached is the V23 of the libpq trace patch.

(1)
From: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> It appears that something is still wrong. I applied lipq pipeline v27 from [1]
> and ran src/test/modules/test_libpq/pipeline singlerow, after patching it to
> do PQtrace() after PQconn(). Below is the output I get from that. The
> noteworthy point is that "ParseComplete" messages appear multiple times
> for some reason ... but that's quite odd, because if I look at the network traffic
> with Wireshark I certainly do not see the ParseComplete message being sent
> three times.

I applied Alvaro's libpq pipeline patch (v33) [1] on top of this libpq trace patch,
then traced the src/test/modules/libpq_pipeline/libpq_pipeline.c test_singlerowmode.
It's still the same trace output even after applying the fix recommendations
of Horiguchi-san.

> Parse 38 "" "SELECT generate_series(42, $1)" #0
> Bind 20 "" "" #0 #1 2 '44' #1 #0
> Describe 6 P ""
> Execute 9 "" 0
> Parse 38 "" "SELECT generate_series(42, $1)" #0
> Bind 20 "" "" #0 #1 2 '45' #1 #0
> Describe 6 P ""
> Execute 9 "" 0
> Parse 38 "" "SELECT generate_series(42, $1)" #0
> Bind 20 "" "" #0 #1 2 '46' #1 #0
> Describe 6 P ""
> Execute 9 "" 0
> Sync 4
< ParseComplete 4
< BindComplete 4
< RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0
< DataRow 12 #1 2 '42'
< DataRow 12 #1 2 '43'
< DataRow 12 #1 2 '44'
< CommandComplete 13 "SELECT 3"
< ParseComplete 4
< ParseComplete 4
< ParseComplete 4
< BindComplete 4
< RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0
< DataRow 12 #1 2 '42'
< DataRow 12 #1 2 '43'
< DataRow 12 #1 2 '44'
< DataRow 12 #1 2 '45'
< CommandComplete 13 "SELECT 4"
< ParseComplete 4
< ParseComplete 4
< ParseComplete 4
< BindComplete 4
< RowDescription 40 #1 "generate_series" 0 #0 23 #4 -1 #0
< DataRow 12 #1 2 '42'
< DataRow 12 #1 2 '43'
< DataRow 12 #1 2 '44'
< DataRow 12 #1 2 '45'
< DataRow 12 #1 2 '46'
< CommandComplete 13 "SELECT 5"
< ReadyForQuery 5 I

ParseComplete still appears 3 times, but I wonder if that is expected only for pipeline's singlerowmode test.
From my observation, when I traced the whole regression test output
by putting PQtrace() in fe-connect.c: connectDBComplete(),
ParseComplete appears only once or twice, etc. for other commands.
In other words, ISTM, it's a case-to-case basis:
Some examples from the whole regression trace output is below:
a.
> Describe 6 P ""
> Execute 9 "" 0
> Sync 4
< ParseComplete 4
< BindComplete 4
b.
< ErrorResponse 217 S "ERROR" V "ERROR" C "42883" M "function no_such_function(integer) does not exist" H "No function matches the given name and argument types. You might need to add explicit type casts." P "8" F "parse_func.c" L "629" R "ParseFuncOrColumn" \x00
< ReadyForQuery 5 I
< ParseComplete 4
< ParseComplete 4
c.
> Bind 31 "" "my_insert" #0 #1 4 '9652' #1 #0
> Describe 6 P ""
> Execute 9 "" 0
< ParseComplete 4
< ParseComplete 4
< ParseComplete 4
< BindComplete 4
< NoData 4
d.
< CommandComplete 15 "INSERT 0 2"
< ReadyForQuery 5 T
> Parse 89 "i" "insert into test (name, amount, letter) select name, amount+$1, letter from test" #0
> Sync 4
< ParseComplete 4
< ReadyForQuery 5 T

As you can see from the examples above, ParseComplete appears in multiples/once depending on the test.
As for libpq_pipeline's test_singlerowmode, it appears 3x for the CommandComplete SELECT.
I am wondering now whether or not it's a bug. Maybe it's not.
Thoughts?

How I traced the whole regression test: fe-connect.c: connectDBComplete()

@@ -2114,6 +2137,7 @@ connectDBComplete(PGconn *conn)
int timeout = 0;
int last_whichhost = -2; /* certainly different from whichhost */
struct addrinfo *last_addr_cur = NULL;
+ FILE *trace_file;

if (conn == NULL || conn->status == CONNECTION_BAD)
return 0;
@@ -2171,6 +2195,9 @@ connectDBComplete(PGconn *conn)
switch (flag)
{
case PGRES_POLLING_OK:
+ trace_file = fopen("/home/postgres/tracelog/regression_tracepipeline_a.out","a");
+ PQtrace(conn, trace_file);
+ PQtraceSetFlags(conn, PQTRACE_SUPPRESS_TIMESTAMPS);

(2)
> From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
> > > This inhibits logCursor being updated. What is worse, I find that
> > > logCursor movement is quite dubious.
> >
> > Using (inCursor - inStart) as logCursor doesn't work correctly if
> > tracing state desyncs. Once desync happens inStart can be moved at
> > the timing that the tracing code doesn't expect.
> - This requires (as I
> - mentioned upthread) pqReadData to actively reset logCursor, though.
> + So pgReadData needs to avtively reset logCursor. If logCursor is
> + actively reset, we no longer need to use (inCursor - inStart) as
> + logCursor and it is enough that logCursor follows inCursor.
> >
> > logCursor should move when bytes are fed to the tracing functoins even
> > when theyare the last bytes of a message.

Following the advice of Horiguchi-san above, of updating logCursor through pqReadData,
I have updated pqReadData and pqCheckInBufferSpace.
In pqTraceOutputBeByte1 and pqTraceOutputBeInt.
logCursor now uses the inCursor value because pqReadData actively resets the logCursor.

In addition, I also fixed the following:

(3)
In pqTraceOutputBeByte1, I added the missing update of logCursor for the LOG_FIRST_BYTE:

+ case LOG_FIRST_BYTE:
+ {
+ ...
+ conn->be_msg->logCursor = conn->inCursor;
+ break;
+ }

(4)
In pqTraceOutputBeInt():

I changed 0 to false:
- bool logfinish = 0;
+ bool logfinish = false;

I followed this advice and applied to changes for LOG_LENGTH:

> + case LOG_LENGTH:
> + fprintf(conn->Pfdebug, "%d", v);
> -+ conn->be_msg->length = v - length;
> ++ conn->be_msg->length = v;
> + /* Next, log the message contents */
> + conn->be_msg->state = LOG_CONTENTS;
> -+ logfinish = pqTraceMaybeBreakLine(0, conn);
> ++ logfinish = pqTraceMaybeBreakLine(length, conn);

(5)
Because of the changes in pqReadData, I exposed again the
following and moved from libpq-trace.c to libq-trace.h:
+typedef enum PGLogState
+typedef struct pqBackendMessage

(6)
As reported abovethread, the Assert in pqTraceMaybeBreakLine() still breaks. So I removed it
>psql: libpq-trace.c:320: pqTraceMaybeBreakLine: Assertion `conn->be_msg->length > 0' failed.

(7) Not included in this patch is a fix for the bug I found from the whole regression test output.
There are many instances of Invalid Protocol, but only for FunctionCallResponse

< FunctionCallResponse 8 0
'< :::Invalid Protocol
'

See fe-protocol3.c: pqFunctionCall3().
Perhaps it has something to do when the actual_result_len == 0, and it's not integer (pqGetnchar).
And somewhere among pqTraceOutputBeInt(), pqTraceMaybeBreakLine(), etc.
probably breaks something when message is V, 8, 0 \0.
(Referring to (6) above, maybe the reason why Assert(conn->be_msg->length > 0) fails
in pqTraceMaybeBreakLine() is because the length may possibly be 0.)

Thoughts?

fe-protocol3.c: pqFunctionCall3():
case 'V': /* function result */
if (pqGetInt(actual_result_len, 4, conn))
continue;
if (*actual_result_len != -1)
{
if (result_is_int)
{
if (pqGetInt(result_buf, *actual_result_len, conn))
continue;
}
else
{
if (pqGetnchar((char *) result_buf,
*actual_result_len,
conn))
continue;
}
}

(8) Regarding the exchange of discussions between Tom Lane and Tsunakawa-san
about the improved tracing design, Iwata-san is now working on it and
will submit the updated patch soon.

Regards,
Kirk Jamison

[1] https://www.postgresql.org/message-id/flat/20210304214017.GA22915%40alvherre.pgsql#bd8f9b26a50256bb6b0d9f273fc2889e

Attachment Content-Type Size
v23-0001-Improve-libpq-tracing-capabilities.patch application/octet-stream 34.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-03-05 12:57:43 Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Previous Message Alexey Bashtanov 2021-03-05 12:42:55 Re: [patch] bit XOR aggregate functions