Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)

From: Julian Markwort <julian(dot)markwort(at)uni-muenster(dot)de>
To: David Fetter <david(at)fetter(dot)org>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, marius(dot)timmer(at)uni-muenster(dot)de, arne(dot)scheffer(at)uni-muenster(dot)de
Subject: Re: [HACKERS] [FEATURE PATCH] pg_stat_statements with plans (v02)
Date: 2018-01-15 16:48:51
Message-ID: ffaccd0b-e8c5-1fcc-4655-08992d35b4be@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/11/2018 03:43 PM, David Fetter wrote:
> Is the assumption of a normal distribution reasonable for outlier
> plans as you've seen them?
This is a difficult but fair question.
First of all, I'd like to clarify that the normal distribution is
assumed for the set of all execution times matching a queryid; No
assumptions are made about the distribution of the outliers themselves.
The primary goal of this approach was the limitation of plan updates, to
avoid unnecessary IO operations.
When query performance does not vary much, no updates of the plans
should be necessary, but as soon as query performance varies too much,
the new plan should be stored.
For the purpose of distinguishing reasonable variance between execution
times and great variance due to changing conditions which ultimately
might result in a different plan, the assumption of a normal
distribution for all execution times suits well.

Based on some early testing, this results in only a few percent of
updates (1-3%) in relation to the total number of calls, when running
some short pgbench tests.
As the sample size grows, the assumption of a normal distribution
becomes increasingly accurate and the (unnecessary) sampling of plans
decreases.
In a different test, I ran several queries with identical table sizes,
the queries were fairly simple,  and the statistical evaluation led to
few updates during these tests. When I increased the table size
significantly, the database switched to a different plan. Because the
execution time differed significantly, this new bad plan was stored.
Similarly, after running a certain query a couple of times, I created an
index on my test data, which resulted in a speedup which was significant
enough to result in an update of the good plan.

Now, if a change to the data or the index situation only resulted in an
insignificant performance increase or decrease (one that falls into the
interval defined as [mean - 1.5*IQD, mean + 1-5*IQD] ), I think it might
be possible to assume that we are not interested in an updated plan for
this scenario.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Geoff Winkless 2018-01-15 16:53:27 Re: proposal: alternative psql commands quit and exit
Previous Message Tom Lane 2018-01-15 16:48:11 Re: proposal: alternative psql commands quit and exit