Re: [FEATURE PATCH] pg_stat_statements with plans (v02)

From: Julian Markwort <julian(dot)markwort(at)uni-muenster(dot)de>
To: legrand legrand <legrand_legrand(at)hotmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, arne(dot)scheffer(at)uni-muenster(dot)de, marius(dot)timmer(at)uni-muenster(dot)de
Subject: Re: [FEATURE PATCH] pg_stat_statements with plans (v02)
Date: 2018-04-18 15:48:51
Message-ID: 1524066531.3274.48.camel@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2018-04-06 at 13:57 -0700, legrand legrand wrote:
> At startup time there are 2 identical plans found in the view
> I thought it should have be only one: the "initial" one
> (as long as there is no "good" or "bad" one).

Yes, those are 'remnants' from the time where I had two columns, one
for good_plan and one for bad_plan.
Whenever only one plan existed (either because the statement hadn't
been executed more than once or because the deviation from the previous
plan's time wasn't significant enough), the patch would display the
same plan string for both columns.

I'll have to be a little creative, but I think I'll be able to print
only one plan if both the good and bad plan are essentially the
"inital" plan.

You can be assured that the plan string isn't saved twice for those
cases, the good_plan and bad_plan structs just used the same offset
for the qtexts file.

> maybe 3 "plan_types" like "init", "good" and "bad" should help.

I think, the categorization into good and bad plans can be made based
on the execution time of said plan.
I'd suggest using something like
SELECT * FROM pg_stat_statements_plans GROUP BY queryid ORDER BY
plan_time;
This way, if there is only one row for any queryid, this plan is
essentially the "initial" one. If there are two plans, their goodness
can be infered from the exection times...

> Will a new line be created for good or bad plan if the plan is the
> same ?
> shouldn't we capture "constants" and "parameters" inspite ?

Capturing constants and parameters would require us to normalize the
plan, which isn't done currently. (Primarily. because it would involve
a lot of logic; Secondly, because for my use case, I'm interested in
the specific constants and parameters that led to a good or bad plan)

Do you have a use case in mind which would profit from normalized
plans?

> Is query text needed in plan?
> it can be huge and is already available (in normalized format)
> in pg_stat_statements. If realy needed in raw format
> (with constants) we could force pgss to store it (in replacement of
> normalize one)
> using a guc pg_stat_statements.normalized = false (I have a patch to
> do
> that).

The query is part of what's returned by an explain statement, so I've
thought to keep the 'plan' intact.
Since all long strings (queries and plans alike) are stored in a file
on disk, I'm not too much worried about the space these strings take
up.
I'd think that a hugely long query would lead to an even longer plan,
in which case the problem to focus on might not be to reduce the length
of the string stored by a statistical plugin...

>
> In Thread:
> http://www.postgresql-archive.org/Poc-pg-stat-statements-with-planid-
> td6014027.html
> pg_stat_statements view has a new key with dbid,userid,queryid +
> planid
> and 2 columns added: "created" and "last_updated".

I've taken a look at your patch. I agree that having a plan identifier
would be great, but I'm not a big fan of the jumbling. That's a lot of
hashing that needs to be done to decide wether two plans are
essentially equivalent or not.

Maybe, in the future, in a different patch, we could add functionality
that would allow us to compare two plan trees. There is even a remark
in the header of src/backend/nodes/equalfuncs.c regarding this:
* Currently, in fact, equal() doesn't know how to compare Plan trees
* either. This might need to be fixed someday.

> May be your view pg_stat_statements_plans could be transformed :
> same key as pgss: dbid,userid,queryid,planid
> and THE plan column to store explain result (no more need for
> plan_time nor
> tz that
> are already available in modified pgss).

The documentation [of pg_stat_statements] gives no clear indication
which fields qualify as primary keys, mainly because the hashing that
generates the queryid isn't guaranteed to produce unique results...
So I'm not sure which columns should be used to create associations
between pg_stat_statements and pg_stat_statements_plans.

> Thoses changes to pgss are far from being accepted by community:
> - planid calculation has to be discussed (I reused the one from
> pg_stat_plans,
> but I would have prefered explain command to provide it ...),
> - "created" and "last_updated" columns also,
> - ...
>
> So maybe your stategy to keep pgss unchanged, and add extensions view
> is
> better.
> There is a risk of duplicated informations (like execution_time,
> calls, ...)

It might be possible to completely seperate these two views into two
extensions. (pg_stat_statements_plans could use it's own hooks to
monitor query execution)
But that would probably result in a lot of duplicated code (for the
querytexts file) and would be a lot more elaborate to create, as I'd
need to recreate most of the hashmaps and locks.

Piggybacking onto the existing logic of pg_stat_statements to store a
plan every once in a while seems simpler to me.

Regards
Julian

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-18 15:50:55 Re: Query is over 2x slower with jit=on
Previous Message Simon Riggs 2018-04-18 15:44:53 Re: reloption to prevent VACUUM from truncating empty pages at the end of relation