Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)

From: Julian Markwort <julian(dot)markwort(at)uni-muenster(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, marius(dot)timmer(at)uni-muenster(dot)de, arne(dot)scheffer(at)uni-muenster(dot)de
Subject: Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
Date: 2018-01-10 14:05:39
Message-ID: 1bd396a9-4573-55ad-7ce8-fe7adffa1bd9@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I'd like to follow up to my previous proposition of tracking (some) best
and worst plans for different queries in the pg_stat_statements extension.

Based on the comments and suggestions made towards my last endeavour,
I've taken the path of computing the interquartile distance (by means of
an adapted z-test, under the assumption of normal distribution, based on
the mean_time and stddev_time already used by the extension).

A bad plan is recorded, if there is no previously recorded plan, or if
the current execution time is greater than the maximum of the previously
recorded plan's time and the query's mean+1.5*interquartile_distance.
A good plan is recorded on a similar condition; The execution time needs
to be shorter than the minimum of the previously recorded good plan's
time and the query's mean-1.5*interquartile_distance.

The boundaries are chosen to resemble the boundaries for whiskers in
boxplots.
Using these boundaries, plans will be updated very seldomly, as long as
they are more or less normally distributed.
Changes in the plans (for example the use of indices) used for each kind
of query will most likely result in execution times exceeding these
boundaries, so such changes are (very probably) recorded.

The ideal solution would be to compare the current plan with the last
plan and only update when there is a difference between them, however I
think this is unreasonably complex and a rather expensive task to
compute on the completion of every query.

The intent of this patch is to provide a quick insight into the plans
currently used by the database for the execution of certain queries. The
tracked plans only represent instances of queries with very good or very
poor performance.

I've (re)submitted this patch for the next commitfest as well.

Kind regards
Julian

On 03/04/2017 02:56 PM, Julian Markwort wrote:
> Alright, for the next version of this patch I'll look into standard
> deviation (an implementation of Welfords' algorithm already exists in
> pg_stat_statements).
>
> On 3/4/17 14:18, Peter Eisentraut wrote:
>
>> The other problem is that this measures execution time, which can vary
>> for reasons other than plan.  I would have expected that the cost
>> numbers are tracked somehow.
> I've already thought of tracking specific parts of the explanation,
> like the cost numbers, instead of the whole string, I'll think of
> something, but if anybody has any bright ideas in the meantime, I'd
> gladly listen to them.
>
>> There is also the issue of generic vs specific plans, which this
>> approach might be papering over.
> Would you be so kind and elaborate a little bit on this? I'm not sure
> if I understand this correctly. This patch only tracks specific plans,
> yes. The inital idea was that there might be some edge-cases that are
> not apparent when looking at generalized plans or queries.
>
> kind regards
> Julian

Attachment Content-Type Size
pgss_plans_v02.patch text/x-patch 34.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2018-01-10 14:28:27 Re: [HACKERS] PATCH: psql tab completion for SELECT
Previous Message Konstantin Knizhnik 2018-01-10 13:41:16 Re: AS OF queries