Quick Links

Re: Wildly inaccurate query plan

From:	Thom Brown <thombrown(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Wildly inaccurate query plan
Date:	2010-05-28 19:05:40
Message-ID:	AANLkTimD7onzq_mLk1bYdGoYW3yaS_iqNH53UqF_Uzyx@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 28 May 2010 19:54, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thom Brown <thombrown(at)gmail(dot)com> writes:
>> I get this:
>
>> Limit (cost=0.00..316895.11 rows=400 width=211) (actual
>> time=3.880..1368.936 rows=400 loops=1)
>> -> GroupAggregate (cost=0.00..41843621.95 rows=52817 width=211)
>> (actual time=3.872..1367.048 rows=400 loops=1)
>> -> Index Scan using "binaryID_2576_idx" on parts_2576
>> (cost=0.00..41683754.21 rows=10578624 width=211) (actual
>> time=0.284..130.756 rows=19954 loops=1)
>> Index Cond: (("binaryID")::text >
>> '1082fa89fe499741b8271f9c92136f44'::text)
>> Total runtime: 1370.140 ms
>
>> The first thing which strikes me is how the GroupAggregate step shows
>> it got the 400 rows which matches the limit, but it estimated 52,817
>> rows. Shouldn't it have already known it would be 400?
>
> No. Rowcount estimates are always in terms of what the node would emit
> if allowed to run to completion. Likewise cost. In this case both the
> indexscan and the groupagg are terminated early once they satisfy the
> limit. The planner is expecting this which is why the estimated cost
> for the limit node is way less than those for its inputs.
>
> That looks like a perfectly reasonable plan from here, though it would
> probably not get chosen with a larger limit or no limit at all, since
> the ultimate costs are pretty large. Essentially this is a fast-start
> plan rather than a lowest-total-cost plan, and that looks like the
> best bet for a small limit value.
>
> regards, tom lane

You're absolutely right, it's not chosen when without limit. I see
what you mean though about terminating once it has enough rows. It's
a shame I can't optimise it though as the real case that runs is with
a limit of 4000 which takes a long time to complete.

Thanks

Thom

In response to

Re: Wildly inaccurate query plan at 2010-05-28 18:54:25 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Scott Marlowe	2010-05-28 19:48:54	Zeus IOPS
Previous Message	Greg Smith	2010-05-28 18:57:54	Re: shared_buffers advice