Re: Wildly inaccurate query plan

From: Thom Brown <thombrown(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Wildly inaccurate query plan
Date: 2010-05-28 19:05:40
Message-ID: AANLkTimD7onzq_mLk1bYdGoYW3yaS_iqNH53UqF_Uzyx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 28 May 2010 19:54, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thom Brown <thombrown(at)gmail(dot)com> writes:
>> I get this:
>
>> Limit  (cost=0.00..316895.11 rows=400 width=211) (actual
>> time=3.880..1368.936 rows=400 loops=1)
>>    ->  GroupAggregate  (cost=0.00..41843621.95 rows=52817 width=211)
>> (actual time=3.872..1367.048 rows=400 loops=1)
>>          ->  Index Scan using "binaryID_2576_idx" on parts_2576
>> (cost=0.00..41683754.21 rows=10578624 width=211) (actual
>> time=0.284..130.756 rows=19954 loops=1)
>>                Index Cond: (("binaryID")::text >
>> '1082fa89fe499741b8271f9c92136f44'::text)
>>  Total runtime: 1370.140 ms
>
>> The first thing which strikes me is how the GroupAggregate step shows
>> it got the 400 rows which matches the limit, but it estimated 52,817
>> rows.  Shouldn't it have already known it would be 400?
>
> No.  Rowcount estimates are always in terms of what the node would emit
> if allowed to run to completion.  Likewise cost.  In this case both the
> indexscan and the groupagg are terminated early once they satisfy the
> limit.  The planner is expecting this which is why the estimated cost
> for the limit node is way less than those for its inputs.
>
> That looks like a perfectly reasonable plan from here, though it would
> probably not get chosen with a larger limit or no limit at all, since
> the ultimate costs are pretty large.  Essentially this is a fast-start
> plan rather than a lowest-total-cost plan, and that looks like the
> best bet for a small limit value.
>
>                        regards, tom lane

You're absolutely right, it's not chosen when without limit. I see
what you mean though about terminating once it has enough rows. It's
a shame I can't optimise it though as the real case that runs is with
a limit of 4000 which takes a long time to complete.

Thanks

Thom

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2010-05-28 19:48:54 Zeus IOPS
Previous Message Greg Smith 2010-05-28 18:57:54 Re: shared_buffers advice