Re: VACUUM ANALYZE slows down query

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: werner fraga <wfraga(at)yahoo(dot)com>
Cc: John Arbash Meinel <john(at)arbash-meinel(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: VACUUM ANALYZE slows down query
Date: 2005-02-22 23:08:21
Message-ID: 13064.1109113701@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I wrote:
> Well, with the increased (and much more accurate) rowcount estimate,
> the estimated cost of the nestloop naturally went up a lot: it's
> proportional to the number of rows involved. It appears that the
> estimated cost of the mergejoin actually went *down* quite a bit
> (else it'd have been selected the first time too). That seems odd to
> me.

Nah, I just can't count :-(. What I forgot about was the sub-select in
the output list:

>> select ToolRepairRequest.RequestID, (Select
>> count(ToolHistory.HistoryID) from ToolHistory where
>> ToolRepairRequest.RepairID=ToolHistory.RepairID) as
>> CountOfTH

which shows up in the (un-analyzed) EXPLAIN output here:

SubPlan
-> Aggregate (cost=524.17..524.17 rows=1 width=4) (actual time=0.032..0.035 rows=1 loops=1518)
-> Index Scan using th_repair_key on toolhistory (cost=0.00..523.82 rows=140 width=4) (actual time=0.013..0.018 rows=1 loops=1518)
Index Cond: ($0 = repairid)

Now in this case the planner is estimating 79 rows out, so the estimated
cost of the nestloop plan includes a charge of 79*524.17 for evaluating
the subplan. If we discount that then the estimated cost of the
nestloop plan is 3974.74..6645.99 (48055.42-79*524.17).

In the ANALYZEd case the subplan is estimated to be a lot cheaper:

SubPlan
-> Aggregate (cost=6.98..6.98 rows=1 width=4) (actual time=0.038..0.042 rows=1 loops=1518)
-> Index Scan using th_repair_key on toolhistory (cost=0.00..6.97 rows=2 width=4) (actual time=0.016..0.021 rows=1 loops=1518)
Index Cond: ($0 = repairid)

It's estimated to be needed 1533 times, but that still adds up to less
of a charge than before. Discounting that, the mergejoin plan was
estimated at 18310.59..18462.10 (29162.44 - 1533*6.98). So it's not
true that the estimated cost of the join went down in the ANALYZEd case.

Werner sent me a data dump off-list, and trawling through the planner I
got these numbers for the estimated costs without the output subquery:

without any statistics:
mergejoin cost 9436.42 .. 9571.81
nestloop cost 3977.74 .. 6700.71

with statistics:
mergejoin cost 18213.04 .. 18369.73
nestloop cost 4054.93 .. 24042.85

(these are a bit different from his results because of different ANALYZE
samples etc, but close enough)

So the planner isn't going crazy: in each case it chose what seemed the
cheapest total-cost plan.

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gaetano Mendola 2005-02-23 01:03:10 Re: is pg_autovacuum so effective ?
Previous Message Luke Chambers 2005-02-22 22:08:05 Inefficient Query Plans