RE: Planning counters in pg_stat_statements (using pgss_store)

From: legrand legrand <legrand_legrand(at)hotmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: RE: Planning counters in pg_stat_statements (using pgss_store)
Date: 2020-03-14 10:04:00
Message-ID: 1584180240397-0.post@n3.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

imai(dot)yoshikazu(at)fujitsu(dot)com wrote
> On Thu, Mar 12, 2020 at 6:37 PM, Julien Rouhaud wrote:
>> On Thu, Mar 12, 2020 at 1:11 PM Marco Slot &lt;

> marco.slot@

> &gt; wrote:
>> > On Thu, Mar 12, 2020 at 11:31 AM Julien Rouhaud &lt;

> rjuju123@

> &gt;
>> wrote:
>> > > There's at least the current version of IVM patchset that lacks the
>> > > querytext. Looking at various extensions, I see that pg_background
>> > > and pglogical call pg_plan_query internally but shouldn't have any
>> > > issue providing the query text. But there's also citus extension,
>> > > which don't keep around the query string at least when distributing
>> > > plans, which makes sense since it's of no use and they're heavily
>> > > modifying the original Query. I think that citus folks opinion on
>> > > the subject would be useful, so I'm Cc-ing Marco.
>> >
>> > Most of the time we keep our Query * data structures in a form that
>> > can be deparsed back into a query string by a modified copy of
>> > ruleutils.c, so we could generate a correct query string if absolutely
>> > necessary, though there are performance-sensitive cases where we'd
>> > rather not have the deparsing overhead.
>>
>> Yes, deparsing is probably too expensive for that use case.
>>
>> > In case of INSERT..SELECT into a distributed table, we might call
>> > pg_plan_query on the SELECT part (Query *) and send the output into a
>> > DestReceiver that sends tuples into shards of the distributed table
>> > via COPY. The fact that SELECT does not show up in pg_stat_statements
>> > separately is generally fine because it's an implementation detail,
>> > and it would probably be a little confusing because the user never ran
>> > the SELECT query. Moreover, the call to pg_plan_query would already be
>> > reflected in the planning or execution time of the top-level query, so
>> > it would be double counted if it had its own entry.
>>
>> Well, surprising statements can already appears in pg_stat_statements
>> when
>> you use some psql features, or if you have triggers as those will run
>> additional
>> queries under the hood.
>>
>> The difference here is that since citus is a CustomNode, underlying calls
>> to
>> planner will be accounted for that node, and that's indeed annoying. I
>> can see
>> that citus is doing some calls to spi_exec or
>> Executor* (in ExecuteLocalTaskPlan), which could also trigger
>> pg_stat_statements, but I don't know if a queryid is present there.
>>
>> > Another case is when some of the shards turn out to be local to the
>> > server that handles the distributed query. In that case we plan the
>> > queries on those shards via pg_plan_query instead of deparsing and
>> > sending the query string to a remote server. It would be less
>> > confusing for these queries to show in pg_stat_statements, because the
>> > queries on the shards on remote servers will show up as well. However,
>> > this is a performance-sensitive code path where we'd rather avoid
>> > deparsing.
>>
>> Agreed.
>>
>> > In general, I'd prefer if there was no requirement to pass a correct
>> > query string. I'm ok with passing "SELECT 'citus_internal'" or just ""
>> > if that does not lead to issues. Passing NULL to signal that the
>> > planner call should not be tracked separately does seem a bit cleaner.
>>
>> That's very interesting feedback, thanks! I'm not a fan of giving a way
>> for
>> queries to say that they want to be ignored by pg_stat_statements, but
>> double
>> counting the planning counters seem even worse, so I'm +0.5 to accept
>> NULL
>> query string in the planner, incidentally making pgss ignore those.
>
> It is preferable that we can count various queries statistics as much as
> possible
> but if it causes overhead even when without using pg_stat_statements, we
> would
> not have to force them to set appropriate query_text.
> About settings a fixed string in query_text, I think it doesn't make much
> sense
> because users can't take any actions by seeing those queries' stats.
> Moreover, if
> we set a fixed string in query_text to avoid pg_stat_statement's errors,
> codes
> would be inexplicable for other developers who don't know there's such
> requirements.
> After all, I agree accepting NULL query string in the planner.
>
> I don't know it is useful but there are also codes that avoid an error
> when
> sourceText is NULL.
>
> executor_errposition(EState *estate, int location)
> {
> ...
> /* Can't do anything if source text is not available */
> if (estate == NULL || estate->es_sourceText == NULL)
> }
>
> --
> Yoshikazu Imai

Hello Imai,

My understanding of V5 patch, that checks for Non-Zero queryid,
manage properly case where sourceText is NULL.

A NULL sourceText means that there was no Parsing for the associated
query, if there was no Parsing, there is no queryid (queryId=0),
and no planning counters update.

It doesn't change pg_plan_query behaviour (no regression for Citus, IVM,
...),
and was tested with success for IVM.

If my understanding is wrong, then setting track_planning = false
would still be the work arround for the very rare (no core) extension(s)
that may hit the NULL query text assertion failure.

What do you think about this ?
Would this make V5 patch ready for committers ?

Thanks in advance.
Regards
PAscal

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2020-03-14 10:39:23 RE: Planning counters in pg_stat_statements (using pgss_store)
Previous Message Victor Wagner 2020-03-14 09:49:28 Re: make check crashes on POWER8 machine