Re: explain plans with information about (modified) gucs

From: Sergei Agalakov <sergei(dot)agalakov(at)gmail(dot)com>
To: tomas(dot)vondra(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: explain plans with information about (modified) gucs
Date: 2019-01-04 21:38:50
Message-ID: 23164510-3f59-3289-3b32-9c8643711a6c@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Hi,
>
> every now and then I have to investigate an execution plan that is
> strange in some way and I can't reproduce the same behavior. Usually
> it's simply due to data distribution changing since the problem was
> observed (say, after a nightly batch load/update).
>
> In many cases it however may be due to some local GUC tweaks, usually
> addressing some query specific issues (say, disabling nested loops or
> lowering join_collapse_limit). I've repeatedly ran into cases where the
> GUC was not properly reset to the "regular" value, and it's rather
> difficult to identify this is what's happening. Or cases with different
> per-user settings and connection pooling (SET SESSION AUTHORIZATION /
> ROLE etc.).
>
> So I propose to extend EXPLAIN output with an additional option, which
> would include information about modified GUCs in the execution plan
> (disabled by default, of course):
>
> test=# explain (gucs) select * from t;
>
> QUERY PLAN
> --------------------------------------------------------------------
> Seq Scan on t (cost=0.00..35.50 rows=2550 width=4)
> GUCs: application_name = 'x', client_encoding = 'UTF8',
> cpu_tuple_cost = '0.01'
> (2 rows)
>
> Of course, this directly applies to auto_explain too, which gets a new
> option log_gucs.
>
> The patch is quite trivial, but there are about three open questions:
>
> 1) names of the options
>
> I'm not particularly happy with calling the option "gucs" - it's an
> acronym and many users have little idea what GUC stands for. So I think
> a better name would be desirable, but I'm not sure what would that be.
> Options? Parameters?
>
> 2) format of output
>
> At this point the names/values are simply formatted into a one-line
> string. That's not particularly readable, and it's not very useful for
> the YAML/JSON formats I guess. So adding each modified GUC as an extra
> text property would be better.
>
> 3) identifying modified (and interesting) GUCs
>
> We certainly don't want to include all GUCs, so the question is how to
> decide which GUCs are interesting. The simplest approach would be to
> look for GUCs that changed in the session (source == PGC_S_SESSION), but
> that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we
> probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE
> because that includes irrelevant options like wal_buffers etc.
>
> For now I've used
>
> /* return only options that were modified (not as in config file) */
> if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE))
> continue;
>
> which generally does the right thing, although it also includes stuff
> like application_name or client_encoding. But perhaps it'd be better to
> whitelist the GUCs in some way, because some of the user-defined GUCs
> may be sensitive and should not be included in plans.
>
> Opinions?
>
>
> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

This patch correlates with my proposal
"add session information column to pg_stat_statements"
https://www.postgresql.org/message-id/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com
They both address the problem to identify the factors that make different execution plans
for the same SQL statements. You are interested in the current settings that affect the execution plan,
I'm concerned about historical data in pg_stat_statements.
From my experience the most often offending settings are current_schemas/search_path and current_user.
Please have in mind that probably the same approach that you will use to extend explain plan functionality
will be eventually implemented to extend pg_stat_statements.
I think that the list of the GUCs that are reported by explain plan should be structured like JSON, something like
extended_settings:
{
"current_schemas" : ["pg_catalog", "s1", "s2", "public"],
"current_user" : "user1",
"enable_nestloop" : "off",
"work_mem" = "32MB"
}
It is less important for yours use case explain, but is important for pg_stat_statements case.
The pg_stat_statements is often accessed by monitoring and reporting tools, and it will be a good idea to have
the data here in the structured and easily parsed format.

Thank you,

Sergei Agalakov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-04 21:43:39 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Previous Message Tom Lane 2019-01-04 21:22:17 Re: "SELECT ... FROM DUAL" is not quite as silly as it appears