Re: [HACKERS] EXPLAIN ANALYZE on 8.2

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kelly Burkhart <kelly(dot)burkhart(at)gmail(dot)com>, Evgeny Gridasov <eugrid(at)fpm(dot)kubsu(dot)ru>, pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] EXPLAIN ANALYZE on 8.2
Date: 2006-12-15 15:48:45
Message-ID: 20061215154845.GL958@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Dec 15, 2006 at 09:56:57AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > On Fri, Dec 15, 2006 at 12:20:46PM +0000, Simon Riggs wrote:
> >> Maybe sampling every 10 rows will bring things down to an acceptable
> >> level (after the first N). You tried less than 10 didn't you?
>
> > Yeah, it reduced the number of calls as the count got larger. It broke
> > somewhere, though I don't quite remember why.
>
> The fundamental problem with it was the assumption that different
> executions of a plan node will have the same timing. That's not true,
> in fact not even approximately true. IIRC the patch did realize
> that first-time-through is not a predictor for the rest, but some of
> our plan nodes have enormous variance even after the first time.
> I think the worst case is batched hash joins.

It didn't assume that because that's obviously bogus. It assumed the
durations would be spread as a normal distribution. Which meant that
over time the average of the measured iterations would approch the
actual average. It tried to take enough measurements to try and keep
expected error small, but it's statistics, you can only say "this will
give the right answer >95% of the time".

You are correct though, the error was caused by unexpectedly large
variations, or more likely, an unexpected distribution curve.
Statistically, we took enough samples to not be affected significantly
by large variations. Even if it looked more like a gamma distribution
it should not have been as far off as it was.

Looking at alternative approaches, like sampling with a timer, you end
up with the same problem: sometimes the calculations will fail and
produce something strange. The simplest example being than a 100Hz
timer is not going to produce any useful information for queries in the
millisecond range. A higher frequency timer than that is not going to
be available portably.

You could probably throw more effort into refining the statistics behind
it, but at some point we're going to have to draw a line and say: it's
going to be wrong X% of the time, deal with it. If we're not willing to
say that, then there's no point going ahead with any statistical
approach.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-12-15 15:57:25 Re: [HACKERS] EXPLAIN ANALYZE on 8.2
Previous Message Tom Lane 2006-12-15 15:45:56 Re: [HACKERS] EXPLAIN ANALYZE on 8.2

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2006-12-15 15:55:14 Re: New to PostgreSQL, performance considerations
Previous Message Tom Lane 2006-12-15 15:45:56 Re: [HACKERS] EXPLAIN ANALYZE on 8.2