[PATCH] Improve EXPLAIN ANALYZE overhead by sampling

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling
Date: 2006-05-09 20:37:04
Message-ID: 20060509203704.GJ29652@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

This was a suggestion made back in March that would dramatically reduce
the overhead of EXPLAIN ANALYZE on queries that loop continuously over
the same nodes.

http://archives.postgresql.org/pgsql-hackers/2006-03/msg01114.php

What it does behave normally for the first 50 tuples of any node, but
after that it starts sampling at ever increasing intervals, the
intervals controlled by an exponential function. So for a node
iterating over 1 million tuples it takes around 15,000 samples. The
result is that EXPLAIN ANALYZE has a much reduced effect on the total
execution time.

Without EXPLAIN ANALYZE:

postgres=# select count(*) from generate_series(1,1000000);
count
---------
1000000
(1 row)

Time: 2303.599 ms

EXPLAIN ANALYZE without patch:

postgres=# explain analyze select count(*) from generate_series(1,1000000);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=8022.070..8022.073 rows=1 loops=1)
-> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1381.762..4873.369 rows=1000000 loops=1)
Total runtime: 8042.472 ms
(3 rows)

Time: 8043.401 ms

EXPLAIN ANALYZE with patch:

postgres=# explain analyze select count(*) from generate_series(1,1000000);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=15.00..15.01 rows=1 width=0) (actual time=2469.491..2469.494 rows=1 loops=1)
-> Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=0) (actual time=1405.002..2187.417 rows=1000000 loops=1)
Total runtime: 2496.529 ms
(3 rows)

Time: 2497.488 ms

As you can see, the overhead goes from 5.7 seconds to 0.2 seconds.
Obviously this is an extreme case, but it will probably help in a lot
of other cases people have been complaining about.

- To get this close it needs to get an estimate of the sampling overhead.
It does this by a little calibration loop that is run once per backend.
If you don't do this, you end up assuming all tuples take the same time
as tuples with the overhead, resulting in nodes apparently taking
longer than their parent nodes. Incidently, I measured the overhead to
be about 3.6us per tuple per node on my (admittedly slightly old)
machine.

Note that the resulting times still include the overhead actually
incurred, I didn't filter it out. I want the times to remain reflecting
reality as closely as possible.

- I also removed InstrStopNodeMulti and made InstrStopNode take a tuple
count parameter instead. This is much clearer all round.

- I also didn't make it optional. I'm unsure about whether it should be
optional or not, given the number of cases where it will make a
difference to be very few.

- The tuple counter for sampling restarts every loop. Thus a node that is
called repeatedly only returning one value each time will still measure
every tuple, though its parent node won't. We'll need some field
testing to see if that remains a significant effect.

- I don't let the user know anywhere how many samples it took. Is this
something users should care about?

Any comments?
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment Content-Type Size
explain.patch text/plain 15.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cristiano da Cunha Duarte 2006-05-09 20:38:03 BUG #2429: Explain does not report object's schema
Previous Message Simon Riggs 2006-05-09 19:43:30 Re: Inheritance, Primary Keys and Foreign Keys

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-05-09 20:52:14 Re: [PATCH] Improve EXPLAIN ANALYZE overhead by sampling
Previous Message Fabien COELHO 2006-05-09 18:23:17 Re: cast bytea to/from bit strings