Re: full vacuum really slows down query

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Jason Long" <mailing(dot)list(at)supernovasoftware(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: full vacuum really slows down query
Date: 2008-06-05 05:00:17
Message-ID: F0238EBA67824444BC1CB4700960CB4805766CA7@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jason Long wrote:
> I would greatly appreciate any advice on how to read the following
> output from EXPLAIN ANALYZE.
>
> When running the query the first time from a fresh restore it takes
> about 55 sec.
> On a second run is take about 2 sec.
> After the vacuum it takes about 36 sec no matter how many
> times I run it.

> *Query run from a fresh restore*

> "Sort (cost=242145.26..242145.27 rows=1 width=1200) (actual
> time=55100.785..55100.820 rows=158 loops=1)"
...
> "Total runtime: 55101.547 ms"

What is shown is the query plan that the planner chose as having the cheapest estimated cost.
The cost of the various components of the query plan are shown. The units are not time, but estimations of relative cost.
The estimated number of rows that will be returned by each component are also shown.
The actual milliseconds it took and rows returned are also shown. If a component is looped through, the time is the average per loop.
The second values for cost and time are the cost or time to complete.

Documentation:
http://www.postgresql.org/docs/8.3/interactive/using-explain.html

I can't explain why you get negative actual times.

> *Before vacuum but after a second run of the query*

> "Sort (cost=241636.97..241637.00 rows=10 width=183) (actual
> time=2182.240..2182.273 rows=158 loops=1)"
...
> "Total runtime: 2183.176 ms"

Just looking at the first line shows that this second query has chosen a different plan that the one above.

> *After vacuum -- I see no improvement even after several runs*

> "Sort (cost=243084.23..243084.24 rows=5 width=107) (actual
> time=36649.839..36649.876 rows=158 loops=1)"
...
> "Total runtime: 36650.609 ms"

This is obviously a third query plan.

i.e. All three of the queries shown are using different plans, so what you are looking for is why different plans are chosen, rather than why a single plan sometimes runs fast, and sometimes runs slow.

When there are large differences in times that queries take, it is often due to gross under-estimation of the number of rows that a portion of the plan will return, resulting in many loops through a portion of the query that the planner thought it would only go through a small number of times.

The join of popipe1_1_.id = popipe1_.id shows an example of this, in your first query it is the last 5 lines, which it is expecting to loop through once, which we can tell due to the line indented to the same amount much higher up (with estimated cost 240701.08).
That join has moved in the second query, and is under a Materialize line. The chosen plan is different, and that portion is only performed once. The Materialize part was expected to be looped through 12 times, but it went through 3174 times.
In the third plan, it isn't under a Materialize, and is expected to loop 6 times. It loops 3174 times.

Hopefully others can provide more help, I've gotta go...

Regards,
Stephen Denne.
--
At the Datamail Group we value teamwork, respect, achievement, client focus, and courage.
This email with any attachments is confidential and may be subject to legal privilege.
If it is not intended for you please advise by replying immediately, destroy it and do not
copy, disclose or use it in any way.

The Datamail Group, through our GoGreen programme, is committed to environmental sustainability.
Help us in our efforts by not printing this email.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Broersma 2008-06-05 05:54:50 Tripping up on my first attempt at building PG from source
Previous Message Glen Eustace 2008-06-05 02:55:10 zenoss monitor