Re: [HACKERS] statement logging / extended query protocol issues

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Oliver Jowett <oliver(at)opencloud(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] statement logging / extended query protocol issues
Date: 2005-09-23 01:54:31
Message-ID: 200509230154.j8N1sVi17799@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


Here are comments on this posting and the patch (it is long):

Simon Riggs wrote:
> On Thu, 2005-09-08 at 13:14 +1200, Oliver Jowett wrote:
> > Simon Riggs wrote:
> > > Oliver, would it be possible to show a simplified call sequence and what
> > > you would like to see logged for each call?
>
> These are good: Maybe it should even be in the docs for the driver?
> It would be good if it could be written as a test within the driver, so
> we can expand it and test the logging.
>
> > The JDBC driver generates one of these sequences:
> >
> > (1)
> > Parse (unnamed statement) "SELECT 1"
> > Bind (unnamed statement -> unnamed portal)
> > Execute (unnamed portal, no row limit)
> >
> > (2)
> > Parse (named statement S_1) "SELECT 1"
> > repeatedly:
> > Bind (named statement S_1 -> unnamed portal)
> > Execute (unnamed portal, no row limit)
> >
> > (3)
> > Parse (named statement S_2) "SELECT 1"
> > repeatedly:
> > Bind (named statement S_2 -> named portal C_2)
> > repeatedly:
> > Execute (named portal C_2, row limit 42)
>
> Are we sure there is just 3 cases?
>
> > Ideal output is:
> >
> > (1) LOG: statement: SELECT 1
> >
> > (2) LOG: statement: SELECT 1
> > LOG: statement: SELECT 1
> > LOG: statement: SELECT 1
> >
> > (3) LOG: statement: SELECT 1
> > LOG: statement: SELECT 1
> > LOG: statement: SELECT 1
> >
> > In case (3), that's one log line per repeat of the outer loop,
> > regardless of how many Executes are sent in the inner loop.
>
> > Note that case (1) is the most common case for application queries via
> > the JDBC driver, and case (2) is the most common for internally
> > generated statements like BEGIN.
>
> Even if case (3) is not that common, I still want to know it is
> occurring, to see what effect or overhead it has.

Agreed.

> > As you can see from the output I'd like, I don't think that synthesizing
> > FETCH / EXECUTE queries that don't actually exist [is a]
> > useful thing to do, at least at the Joe Average User
> > level.
>
> Your original point at the top of this thread was valid: a get-next-rows
> shouldn't look like a re-execute. We can call it something else if you
> like, as long as we can tell the difference.
>
> We'll only see the output for case (3) when someone has programmed
> things that way by using setFetchSize.
>
> > Also note that the JDBC driver doesn't exercise all of the extended
> > protocol -- for example it's possible to re-Bind the same unnamed
> > statement repeatedly, or have multiple Executes on an unnamed portal
> > with a row limit, but the JDBC driver never does that.
>
> I agree there's not much gained from displaying the BIND statement as it
> is. I argued previously against including the BIND parameters. Now I
> would say we should either include them or leave out BIND altogether.

Agreed. We should output BIND parameters at some point. We do log
client-side PREPARE parameters now.

> Here's a new suggestion and patch that brings together
> - Oliver and Simon's wish to remove BIND from normal logging
> - Oliver's suggestion to remove the PREPARE logging for unnamed
> statements, which would otherwise double-up logging for case(1)
> - Bruce and Simon's view to keep some form of FETCH logging
> - Tom's view to rationalise the way ROWS is mentioned
>
> (lines beginning jdbc don't show in the log, but are just there to show
> clearly the time sequence of activities and what gets subsequently
> logged)
>
> (1)
> > Parse (unnamed statement) "SELECT * from pg_proc"
> > Bind (unnamed statement -> unnamed portal)
> > Execute (unnamed portal, no row limit)
>
> (1)
> jdbc parse
> jdbc bind
> jdbc execute
> LOG: statement: SELECT * from pg_proc
>
> jdbc parse
> jdbc bind
> jdbc execute
> LOG: statement: SELECT * from pg_proc
>
> jdbc parse
> jdbc bind
> jdbc execute
> LOG: statement: SELECT * from pg_proc
>
>
> Notice that the parse of the unnamed statement does *not* now generate a
> log record.

What is the reason for this? I am not so worried about what JDBC uses
but what the protocol supports. From my reading of the documentatation
is seems to support a single prepare and multiple executes of an unnamed
statement.

> (2)
> > Parse (named statement S_1) "SELECT * from pg_proc"
> > repeatedly:
> > Bind (named statement S_1 -> unnamed portal)
> > Execute (unnamed portal, no row limit)
>
> (2)
> jdbc parse S_1
> LOG: statement: PREPARE S_1 AS SELECT * from pg_proc
> (perhaps this should be logged at BIND time, just like the
> optimization?)
>
> jdbc bind S_1
> jdbc execute
> LOG: statement: EXECUTE <unnamed> [PREPARE: SELECT * from pg_proc]
>
> jdbc bind S_1
> jdbc execute
> LOG: statement: EXECUTE <unnamed> [PREPARE: SELECT * from pg_proc]
>
> jdbc bind S_1
> jdbc execute
> LOG: statement: EXECUTE <unnamed> [PREPARE: SELECT * from pg_proc]
>
>
> ...I wonder if <unnamed> just confuses what is going on here? I've left
> it in for now, but suggest that we take that out again?

What do you want to replace it with? A missing name seems confusing.

> (3)
> > Parse (named statement S_2) "SELECT * from pg_proc"
> > repeatedly:
> > Bind (named statement S_2 -> named portal C_2)
> > repeatedly:
> > Execute (named portal C_2, row limit 42)
>
> (3)
> jdbc prepare S_2
> LOG: statement: PREPARE S_2 AS SELECT * from pg_proc
>
> jdbc bind S_2 to C_2
> jdbc execute C_2
> LOG: statement: EXECUTE C_2 ROWS 42 [PREPARE: SELECT * from pg_proc]
> jdbc next (after cache has run out on 42nd row)
> v3 protocol sends E for Execute, execution halts at 49 rows for this set
> of bind parameters
> LOG: statement: FETCH C_2 ROWS 7
>
> jdbc bind S_2 to C_2
> jdbc execute C_2
> LOG: statement: EXECUTE C_2 ROWS 42 [PREPARE: SELECT * from pg_proc]
> jdbc next (after cache has run out on 42nd row)
> v3 protocol sends E for Execute
> LOG: statement: FETCH C_2 ROWS 42
> jdbc next (after cache has run out on 84th row)
> v3 protocol sends E for Execute, execution halts at 95 rows for this set
> of bind parameters
> LOG: statement: FETCH C_2 ROWS 11
>
> Note: log_min_duration_statement logs after execution so can give
> accurate row counts of what was retrieved for first execute and
> subsequent fetches. In that case we log using the word ROWS.
> log_statement=all logs before execution and so only knows what the
> maximum number of rows requested is, not what the actual number of rows
> retrieved will be. In that case we log using the word MAXROWS.
> ROWS and MAXROWS are *not* mentioned unless we specifically set max_rows
> in the execute request using the v3 protocol.
>
> If we agree, I'd suggest this goes into the docs...
>
> I've not written a comprehensive test program that covers all of the
> different settings of v2/v3, named/unnamed, allrows/restricted rows,
> log_statement=all/log_min_duration_statement. I'm not sure that will fit
> within the existing test framework. So this patch is still prototype.

Here are my comments on the patch:

> Index: src/backend/tcop/postgres.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v
> retrieving revision 1.458
> diff -c -c -r1.458 postgres.c
> *** src/backend/tcop/postgres.c 2 Sep 2005 21:50:54 -0000 1.458
> --- src/backend/tcop/postgres.c 9 Sep 2005 10:06:46 -0000
> ***************
> *** 1162,1173 ****
> if (save_log_statement_stats)
> ResetUsage();
>
> - if (log_statement == LOGSTMT_ALL)
> - ereport(LOG,
> - (errmsg("statement: PREPARE %s AS %s",
> - (*stmt_name != '\0') ? stmt_name : "<unnamed>",
> - query_string)));
> -
> /*
> * Start up a transaction command so we can run parse analysis etc.
> * (Note that this will normally change current memory context.)
> --- 1162,1167 ----
> ***************
> *** 1217,1222 ****
> --- 1211,1222 ----
>
> QueryContext = CurrentMemoryContext;
>
> +
> + /* Log parse only when using named statements, unless in debug1 mode */
> + if (log_statement == LOGSTMT_ALL)
> + ereport((is_named ? DEBUG1 : LOG),
> + (errmsg("statement: PREPARE %s AS %s", stmt_name, query_string)));
> +

This seems backwards. The comments says log only named statements, but
the '?' test makes named statements DEBUG. Is this intended?

Why have you dropped the "<unnamed>"? Whether it is DEBUG or not we
should display meaningful output.

> /*
> * Do basic parsing of the query or queries (this should be safe even
> * if we are in aborted transaction state!)
> ***************
> *** 1467,1473 ****
> portal = CreatePortal(portal_name, false, false);
>
> if (log_statement == LOGSTMT_ALL)
> ! ereport(LOG,
> (errmsg("statement: <BIND> %s", portal_name)));
>
> /*
> --- 1467,1473 ----
> portal = CreatePortal(portal_name, false, false);
>
> if (log_statement == LOGSTMT_ALL)
> ! ereport(DEBUG1,
> (errmsg("statement: <BIND> %s", portal_name)));

Can we get the BIND parameters here? They seem significant for
debugging.

> /*
> ***************
> *** 1683,1688 ****
> --- 1683,1689 ----
> bool save_log_duration = log_duration;
> int save_log_min_duration_statement = log_min_duration_statement;
> bool save_log_statement_stats = log_statement_stats;
> + bool subsequent_fetch = false;

Can we call this "execute_is_fetch". That seems clearer.

>
> /* Adjust destination to tell printtup.c what to do */
> dest = whereToSendOutput;
> ***************
> *** 1695,1700 ****
> --- 1696,1710 ----
> (errcode(ERRCODE_UNDEFINED_CURSOR),
> errmsg("portal \"%s\" does not exist", portal_name)));
>
> + /*
> + * If we re-issue an Execute protocol request against an existing
> + * portal, then we are only fetching more rows rather than
> + * completely re-executing the query from the start. atStart is never
> + * reset for a v3 portal, so we are safe to use this check.
> + */
> + if (!portal->atStart)
> + subsequent_fetch = true;
> +
> /*
> * If the original query was a null string, just return
> * EmptyQueryResponse.
> ***************
> *** 1706,1712 ****
> return;
> }
>
> ! if (portal->sourceText)
> {
> debug_query_string = portal->sourceText;
> pgstat_report_activity(portal->sourceText);
> --- 1716,1727 ----
> return;
> }
>
> ! if (subsequent_fetch)
> ! {
> ! debug_query_string = "fetch message";
> ! pgstat_report_activity("<FETCH>");
> ! }
> ! else if (portal->sourceText)

Shouldn't we supply the name of the prepare here, or don't we have one?
If not, we should call it "unnamed".

> {
> debug_query_string = portal->sourceText;
> pgstat_report_activity(portal->sourceText);
> ***************
> *** 1731,1742 ****
> if (save_log_statement_stats)
> ResetUsage();
>
> if (log_statement == LOGSTMT_ALL)
> ! /* We have the portal, so output the source query. */
> ! ereport(LOG,
> ! (errmsg("statement: EXECUTE %s [PREPARE: %s]",
> ! (*portal_name != '\0') ? portal_name : "<unnamed>",
> portal->sourceText ? portal->sourceText : "")));
>
> BeginCommand(portal->commandTag, dest);
>
> --- 1746,1774 ----
> if (save_log_statement_stats)
> ResetUsage();
>
> + /*
> + * log statement details - when using log_statement=all we log the
> + * phrase MAXROWS because this occurs prior to execution
> + */
> if (log_statement == LOGSTMT_ALL)
> ! {
> ! if (subsequent_fetch)
> ! ereport(LOG,
> ! (errmsg("statement: FETCH %s MAXROWS %ld", portal_name,
> ! max_rows)));
> ! else
> ! /* We have the portal, so output the source query. */
> ! if (max_rows <= 0)
> ! ereport(LOG,
> ! (errmsg("statement: EXECUTE %s [PREPARE: %s]",
> ! portal_name,
> portal->sourceText ? portal->sourceText : "")));
> + else
> + ereport(LOG,
> + (errmsg("statement: EXECUTE %s MAXROWS %ld [PREPARE: %s]",
> + portal_name, max_rows,
> + portal->sourceText ? portal->sourceText : "")));
> + }

MAXROWS seems useful. Good.

> BeginCommand(portal->commandTag, dest);
>
> ***************
> *** 1865,1877 ****
> if (save_log_min_duration_statement == 0 ||
> (save_log_min_duration_statement > 0 &&
> usecs >= save_log_min_duration_statement * 1000))
> ! ereport(LOG,
> ! (errmsg("duration: %ld.%03ld ms statement: EXECUTE %s [PREPARE: %s]",
> (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
> (stop_t.tv_usec - start_t.tv_usec) / 1000),
> (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> ! (*portal_name != '\0') ? portal_name : "<unnamed>",
> ! portal->sourceText ? portal->sourceText : "")));
> }
>
> if (save_log_statement_stats)
> --- 1897,1929 ----
> if (save_log_min_duration_statement == 0 ||
> (save_log_min_duration_statement > 0 &&
> usecs >= save_log_min_duration_statement * 1000))
> ! {
> ! if (subsequent_fetch)

Should the duration be snprintf into a local buffer and then used in
each ereport, rather than duplicating the code three times?

> ! ereport(LOG,
> ! (errmsg("duration: %ld.%03ld ms statement: FETCH %s ROWS %ld ",
> (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
> (stop_t.tv_usec - start_t.tv_usec) / 1000),
> (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> ! portal_name,
> ! max_rows)));
> ! else
> ! if (completed)
> ! ereport(LOG,
> ! (errmsg("duration: %ld.%03ld ms statement: EXECUTE %s [PREPARE: %s]",
> ! (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
> ! (stop_t.tv_usec - start_t.tv_usec) / 1000),
> ! (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> ! portal_name,
> ! portal->sourceText ? portal->sourceText : "")));
> ! else
> ! ereport(LOG,
> ! (errmsg("duration: %ld.%03ld ms statement: EXECUTE %s ROWS %ld [PREPARE: %s]",
> ! (long) ((stop_t.tv_sec - start_t.tv_sec) * 1000 +
> ! (stop_t.tv_usec - start_t.tv_usec) / 1000),
> ! (long) (stop_t.tv_usec - start_t.tv_usec) % 1000,
> ! portal_name, max_rows,
> ! portal->sourceText ? portal->sourceText : "")));
> ! }
> }
>
> if (save_log_statement_stats)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-09-23 02:05:57 Re: PCTFree Results
Previous Message Tatsuo Ishii 2005-09-23 00:32:45 Re: Proposed patch to clean up signed-ness warnings

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-09-23 02:07:10 Re: small changes to autovacuum maintainance
Previous Message Tatsuo Ishii 2005-09-23 00:32:45 Re: Proposed patch to clean up signed-ness warnings