Re: Planning counters in pg_stat_statements (using pgss_store)

From: Marco Slot <marco(dot)slot(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: "imai(dot)yoshikazu(at)fujitsu(dot)com" <imai(dot)yoshikazu(at)fujitsu(dot)com>, legrand legrand <legrand_legrand(at)hotmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Marco Slot <marco(at)citusdata(dot)com>
Subject: Re: Planning counters in pg_stat_statements (using pgss_store)
Date: 2020-03-12 12:11:22
Message-ID: CAFMSG9HJQr=H8doWJOp=wqyKbVqxMLkk_Qu2KfpmkKvS-Xn7qQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 12, 2020 at 11:31 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> There's at least the current version of IVM patchset that lacks the
> querytext. Looking at various extensions, I see that pg_background
> and pglogical call pg_plan_query internally but shouldn't have any
> issue providing the query text. But there's also citus extension,
> which don't keep around the query string at least when distributing
> plans, which makes sense since it's of no use and they're heavily
> modifying the original Query. I think that citus folks opinion on the
> subject would be useful, so I'm Cc-ing Marco.

Most of the time we keep our Query * data structures in a form that
can be deparsed back into a query string by a modified copy of
ruleutils.c, so we could generate a correct query string if absolutely
necessary, though there are performance-sensitive cases where we'd
rather not have the deparsing overhead.

In case of INSERT..SELECT into a distributed table, we might call
pg_plan_query on the SELECT part (Query *) and send the output into a
DestReceiver that sends tuples into shards of the distributed table
via COPY. The fact that SELECT does not show up in pg_stat_statements
separately is generally fine because it's an implementation detail,
and it would probably be a little confusing because the user never ran
the SELECT query. Moreover, the call to pg_plan_query would already be
reflected in the planning or execution time of the top-level query, so
it would be double counted if it had its own entry.

Another case is when some of the shards turn out to be local to the
server that handles the distributed query. In that case we plan the
queries on those shards via pg_plan_query instead of deparsing and
sending the query string to a remote server. It would be less
confusing for these queries to show in pg_stat_statements, because the
queries on the shards on remote servers will show up as well. However,
this is a performance-sensitive code path where we'd rather avoid
deparsing.

In general, I'd prefer if there was no requirement to pass a correct
query string. I'm ok with passing "SELECT 'citus_internal'" or just ""
if that does not lead to issues. Passing NULL to signal that the
planner call should not be tracked separately does seem a bit cleaner.

Marco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-03-12 12:11:56 Re: pg11+: pg_ls_*dir LIMIT 1: temporary files .. not closed at end-of-transaction
Previous Message Amit Kapila 2020-03-12 11:57:59 Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager