Re: Query with order by and limit is very slow - wrong index used

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Nowak Michał <michal(dot)nowak(at)me(dot)com>
Cc: Gregg Jaskiewicz <gryzman(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Query with order by and limit is very slow - wrong index used
Date: 2011-10-03 14:28:56
Message-ID: CAF6yO=1_ofBKrQ-eLW+1CQM62cpFFxmduRaojM5fZWkCXiX_zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> a9-dev=> explain analyze select * from records where source_id ='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id limit 200;
>                                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..489.57 rows=200 width=1126) (actual time=99701.773..99703.858 rows=200 loops=1)
>  ->  Index Scan using records_pkey on records  (cost=0.00..2441698.81 rows=997489 width=1126) (actual time=99684.878..99686.936 rows=200 loops=1)
>        Filter: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
>  Total runtime: 99705.916 ms
> (4 rows)
>
> a9-dev=> explain analyze select * from records2 where source_id ='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' order by id2 limit 200;
>                                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..770.01 rows=200 width=1124) (actual time=0.076..0.205 rows=200 loops=1)
>  ->  Index Scan using source_id2_id2_id2x on records2  (cost=0.00..3735751.15 rows=970308 width=1124) (actual time=0.074..0.180 rows=200 loops=1)
>        Index Cond: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
>  Total runtime: 0.235 ms
> (4 rows)
>
>
> First one uses records_pkey, and with estimated cost 2441698.81 runs in over 1,5 minute.
> Second uses index on (source_id, id) and with estimated cost 3735751.15 runs in 235 miliseconds.
>
> IMHO, problem lies not in records distribution nor normalization, but in planner's wrong cost estimation. I don't know to tell/force him to use proper index.

Getting information on your current configuration should help.
Please see http://wiki.postgresql.org/wiki/Slow_Query_Questions

You should take care of the cache effect of your queries between your
tests, here it is not a problem, but this explain was way longer for
this similar query.

a9-dev=> explain analyze select * from records where source_id
='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' order by id
limit 200;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..757.51 rows=200 width=1126) (actual
time=43.648..564.798 rows=200 loops=1)
-> Index Scan using source_id_id_idx on records
(cost=0.00..1590267.66 rows=419868 width=1126) (actual
time=43.631..564.700 rows=200 loops=1)
Index Cond: ((source_id)::text =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'::text)
Total runtime: 564.895 ms

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ben Chobot 2011-10-03 14:29:16 Re: How can i get record by data block not by sql?
Previous Message Nowak Michał 2011-10-03 13:59:19 Re: Query with order by and limit is very slow - wrong index used