Re: Interesting query plan change linked to the LIMIT parameter

From: Yannick Le Guédart <yannick(at)over-blog(dot)com>
To: David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Interesting query plan change linked to the LIMIT parameter
Date: 2009-01-21 07:37:50
Message-ID: 4f5470ad0901202337m7f080bcdw555ed871d20070f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for the rapid response.

I can understand the way the planner makes its guess, but as a matter of
fact, he'll be nearly always wrong, just becausethe most commented articles
have only around 5000 or so comments. I ran the explain analyze tonight
and got this results :

EXPLAIN ANALYZE SELECT _comment.id,
> (get_comment_response(_comment.id)).id AS r_id
> FROM _comment
> INNER JOIN _article
> ON _article.id = _comment.parent_id
> WHERE _comment.parent_id = '17355952'
> ORDER BY _comment.id ASC
> OFFSET 0
> LIMIT 1;
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..3588.42 rows=1 width=8) (actual
> time=498597.115..498597.116 rows=1 loops=1)
> -> Nested Loop (cost=0.00..28524312.40 rows=7949 width=8) (actual
> time=498597.114..498597.114 rows=1 loops=1)
> -> Index Scan using _comment_pkey on _comment
> (cost=0.00..28448324.73 rows=7949 width=16) (actual
> time=498473.360..498473.360 rows=1 loops=1)
> Filter: (parent_id = 17355952::bigint)
> -> Index Scan using _article_pkey on _article (cost=0.00..9.55
> rows=1 width=8) (actual time=63.465..63.465 rows=1 loops=1)
> Index Cond: (_article.id = 17355952::bigint)
> Total runtime: 498615.230 ms
> (7 rows)
>
> EXPLAIN ANALYZE SELECT _comment.id,
> (get_comment_response(_comment.id)).id AS r_id
> FROM _comment
> INNER JOIN _article
> ON _article.id = _comment.parent_id
> WHERE _comment.parent_id = '17355952'
> ORDER BY _comment.id ASC
> OFFSET 0
> LIMIT 1000;
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=10261.19..10263.69 rows=1000 width=8) (actual
> time=127.037..127.267 rows=1000 loops=1)
> -> Sort (cost=10261.19..10281.06 rows=7949 width=8) (actual
> time=127.036..127.128 rows=1000 loops=1)
> Sort Key: _comment.id
> Sort Method: top-N heapsort Memory: 95kB
> -> Nested Loop (cost=0.00..9825.35 rows=7949 width=8) (actual
> time=0.472..122.986 rows=4674 loops=1)
> -> Index Scan using _article_pkey on _article
> (cost=0.00..9.55 rows=1 width=8) (actual time=0.011..0.013 rows=1 loops=1)
> Index Cond: (id = 17355952::bigint)
> -> Index Scan using idx_comment_parent_id on _comment
> (cost=0.00..9716.44 rows=7949 width=16) (actual time=0.235..32.869 rows=4674
> loops=1)
> Index Cond: (_comment.parent_id = 17355952::bigint)
> Total runtime: 127.410 ms
> (10 rows)
>

As you can see, the time is dramaticaly longuer with the LIMIT 1 (or in our
case, LIMIT 2).

Yannick.

2009/1/20 David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com>

> On Tue, Jan 20, 2009 at 10:45 AM, Yannick Le Guédart
> <yannick(at)over-blog(dot)com> wrote:
>
> >
> > The second query scans the whole comment table which is very dangerous
> for
> > production servers.
>
> That's not quite true. The second does an index scan- the planner
> seems to be guessing that it'll fulfill the required limit early in
> the index scan; only with a pathologically bad case would it actually
> have to scan the entire thing. Basically, the second query is
> optimized to spit out the first few rows quickly, since that's all you
> asked for with the limit.
>
> Note that your first query has a final cost estimate of "Limit
> (cost=10261.19..10263.69 rows=1000 width=8)", indicating an estimated
> 10261.19 to emit the first row; the second has "Limit
> (cost=0.00..3588.42 rows=1 width=8)" estimating 0.00 (basically,
> instant) to emit the first - and only desired - row.
>
> That all said, an explain analyze would give us a better idea of
> what's going on- we can't tell if the planner is making bad estimates
> without the knowledge of what the real timing and row count results of
> plan stages were.
>
>
> --
> - David T. Wilson
> david(dot)t(dot)wilson(at)gmail(dot)com
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Youatt 2009-01-21 22:01:41 linux, memory (mis)accounting/reporting, and the planner/optimizer
Previous Message David Wilson 2009-01-20 16:28:33 Re: Interesting query plan change linked to the LIMIT parameter