Skip site navigation (1) Skip section navigation (2)

Re: Wildly inaccurate query plan

From: Thom Brown <thombrown(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Wildly inaccurate query plan
Date: 2010-05-28 19:05:40
Message-ID: AANLkTimD7onzq_mLk1bYdGoYW3yaS_iqNH53UqF_Uzyx@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 28 May 2010 19:54, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thom Brown <thombrown(at)gmail(dot)com> writes:
>> I get this:
>
>> Limit  (cost=0.00..316895.11 rows=400 width=211) (actual
>> time=3.880..1368.936 rows=400 loops=1)
>>    ->  GroupAggregate  (cost=0.00..41843621.95 rows=52817 width=211)
>> (actual time=3.872..1367.048 rows=400 loops=1)
>>          ->  Index Scan using "binaryID_2576_idx" on parts_2576
>> (cost=0.00..41683754.21 rows=10578624 width=211) (actual
>> time=0.284..130.756 rows=19954 loops=1)
>>                Index Cond: (("binaryID")::text >
>> '1082fa89fe499741b8271f9c92136f44'::text)
>>  Total runtime: 1370.140 ms
>
>> The first thing which strikes me is how the GroupAggregate step shows
>> it got the 400 rows which matches the limit, but it estimated 52,817
>> rows.  Shouldn't it have already known it would be 400?
>
> No.  Rowcount estimates are always in terms of what the node would emit
> if allowed to run to completion.  Likewise cost.  In this case both the
> indexscan and the groupagg are terminated early once they satisfy the
> limit.  The planner is expecting this which is why the estimated cost
> for the limit node is way less than those for its inputs.
>
> That looks like a perfectly reasonable plan from here, though it would
> probably not get chosen with a larger limit or no limit at all, since
> the ultimate costs are pretty large.  Essentially this is a fast-start
> plan rather than a lowest-total-cost plan, and that looks like the
> best bet for a small limit value.
>
>                        regards, tom lane

You're absolutely right, it's not chosen when without limit.  I see
what you mean though about terminating once it has enough rows.  It's
a shame I can't optimise it though as the real case that runs is with
a limit of 4000 which takes a long time to complete.

Thanks

Thom

In response to

pgsql-performance by date

Next:From: Scott MarloweDate: 2010-05-28 19:48:54
Subject: Zeus IOPS
Previous:From: Greg SmithDate: 2010-05-28 18:57:54
Subject: Re: shared_buffers advice

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group